论文标题
太好了吗?从滥用语言中预测作者资料
Too good to be true? Predicting author profiles from abusive language
论文作者
论文摘要
在线威胁和虐待的问题可能会通过一种计算方法来减轻,在这种方法中,通过作者分析更好地理解或确定了虐待的来源。但是,滥用语言构成了一个特定的语言领域,尚未对其进行测试,无论是基于文本作者的个性,年龄还是性别而出现的差异。这项研究检查了作者人口统计与虐待与普通语言之间的统计关系,并对人格,年龄和性别进行预测实验。尽管作者特征和语言使用之间建立了一些统计关系,但这些模式并未转化为高预测性能。预测人格特征的实际价值15%以内,预测年龄的误差差距为10年,并且在70%的病例中正确地对性别进行了正确的分类。与先前关于作者分析的研究相比,这些结果很差,因此我们敦促在滥用语言和威胁评估的背景下应用此结果。
The problem of online threats and abuse could potentially be mitigated with a computational approach, where sources of abuse are better understood or identified through author profiling. However, abusive language constitutes a specific domain of language for which it has not yet been tested whether differences emerge based on a text author's personality, age, or gender. This study examines statistical relationships between author demographics and abusive vs normal language, and performs prediction experiments for personality, age, and gender. Although some statistical relationships were established between author characteristics and language use, these patterns did not translate to high prediction performance. Personality traits were predicted within 15% of their actual value, age was predicted with an error margin of 10 years, and gender was classified correctly in 70% of the cases. These results are poor when compared to previous research on author profiling, therefore we urge caution in applying this within the context of abusive language and threat assessment.