论文标题
立体声:测量验证语言模型中的刻板印象偏见
StereoSet: Measuring stereotypical bias in pretrained language models
论文作者
论文摘要
刻板印象是对特定人群的过度信仰,例如亚洲人擅长数学或亚洲人是坏驱动因素。已知这种信念(偏见)会损害目标群体。由于预审前的语言模型经过大型现实世界数据的培训,因此众所周知,它们可以捕获刻板印象的偏见。为了评估这些模型的不利影响,量化它们中捕获的偏差很重要。有关量化偏见的现有文献评估了一小部分人为构建的偏见评估句子的验证语言模型。我们提出立体声,这是一种英文中的大规模天然数据集,可衡量四个领域的刻板印象偏见:性别,职业,种族和宗教。我们在数据集中评估了BERT,GPT-2,Roberta和XLNet等流行模型,并表明这些模型表现出强烈的刻板印象偏见。我们还提供了一个隐藏测试集的排行榜,可在https://stereoset.mit.edu上跟踪未来语言模型的偏见。
A stereotype is an over-generalized belief about a particular group of people, e.g., Asians are good at math or Asians are bad drivers. Such beliefs (biases) are known to hurt target groups. Since pretrained language models are trained on large real world data, they are known to capture stereotypical biases. In order to assess the adverse effects of these models, it is important to quantify the bias captured in them. Existing literature on quantifying bias evaluates pretrained language models on a small set of artificially constructed bias-assessing sentences. We present StereoSet, a large-scale natural dataset in English to measure stereotypical biases in four domains: gender, profession, race, and religion. We evaluate popular models like BERT, GPT-2, RoBERTa, and XLNet on our dataset and show that these models exhibit strong stereotypical biases. We also present a leaderboard with a hidden test set to track the bias of future language models at https://stereoset.mit.edu