论文标题
Wikipedia中LGBT人物刻画的多语言上下文情感分析
Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia
论文作者
论文摘要
叙事文本中的特定词汇选择反映了作者对叙事中人们的态度,并影响了听众的反应。先前的工作使用上下文情感分析(一种自然语言处理(NLP)技术)研究了英语中的人的描述,该技术旨在分析人们在权力,代理和情感方面的描绘方式。我们的工作将这种方法的扩展为多语言设置,该设置由我们收集的新语料和新的多语言模型启用。我们还展示了单词内涵如何在语言和文化之间有所不同,从而突出了概括现有的英语数据集和方法的困难。然后,我们通过分析LGBT社区成员的Wikipedia传记页面跨三种语言:英语,俄语和西班牙语来证明我们的方法的有用性。我们的结果表明,在语言跨语言中描绘LGBT社区的方式有系统的差异,从而在叙事和社会偏见的迹象中浮出水面差异。实际上,该模型可用于识别Wikipedia文章以进行进一步的手动分析 - 可能包含内容差距或特定社会群体不平衡表示的文章。
Specific lexical choices in narrative text reflect both the writer's attitudes towards people in the narrative and influence the audience's reactions. Prior work has examined descriptions of people in English using contextual affective analysis, a natural language processing (NLP) technique that seeks to analyze how people are portrayed along dimensions of power, agency, and sentiment. Our work presents an extension of this methodology to multilingual settings, which is enabled by a new corpus that we collect and a new multilingual model. We additionally show how word connotations differ across languages and cultures, highlighting the difficulty of generalizing existing English datasets and methods. We then demonstrate the usefulness of our method by analyzing Wikipedia biography pages of members of the LGBT community across three languages: English, Russian, and Spanish. Our results show systematic differences in how the LGBT community is portrayed across languages, surfacing cultural differences in narratives and signs of social biases. Practically, this model can be used to identify Wikipedia articles for further manual analysis -- articles that might contain content gaps or an imbalanced representation of particular social groups.