论文标题
在线社区中发现和解释有偏见的概念
Discovering and Interpreting Biased Concepts in Online Communities
论文作者
论文摘要
语言具有隐性的人类偏见,既是反思和人们随身携带的刻板印象的持续性。最近,已经证明基于ML的NLP方法(例如单词嵌入)可以以惊人的精度学习此类语言偏见。单词嵌入的能力已成功地被用作量化和研究人类偏见的工具。但是,以前的研究仅考虑一组预定义的有偏见的概念来证明(例如,性别是否与特定工作相关),或者只是发现有偏见的单词而无需在概念层面上理解其含义。因此,这些方法可能无法找到未预先定义的有偏见的概念,或者他们发现的偏见很难解释和研究。这可能使现有方法不适合发现和解释在线社区中的偏见,因为这些社区可能具有与主流文化中的偏见不同。本文改进,扩展和评估了我们以前的数据驱动方法,以自动发现和解释单词嵌入中编码的有偏见的概念。我们应用这种方法来研究在线社区中使用的语言中存在的有偏见的概念,并在实验中显示了我们方法的有效性和稳定性
Language carries implicit human biases, functioning both as a reflection and a perpetuation of stereotypes that people carry with them. Recently, ML-based NLP methods such as word embeddings have been shown to learn such language biases with striking accuracy. This capability of word embeddings has been successfully exploited as a tool to quantify and study human biases. However, previous studies only consider a predefined set of biased concepts to attest (e.g., whether gender is more or less associated with particular jobs), or just discover biased words without helping to understand their meaning at the conceptual level. As such, these approaches can be either unable to find biased concepts that have not been defined in advance, or the biases they find are difficult to interpret and study. This could make existing approaches unsuitable to discover and interpret biases in online communities, as such communities may carry different biases than those in mainstream culture. This paper improves upon, extends, and evaluates our previous data-driven method to automatically discover and help interpret biased concepts encoded in word embeddings. We apply this approach to study the biased concepts present in the language used in online communities and experimentally show the validity and stability of our method