一眼偏见：深入研究德国教育同行评审数据建模的偏见

论文标题

一眼偏见：深入研究德国教育同行评审数据建模的偏见

Bias at a Second Glance: A Deep Dive into Bias for German Educational Peer-Review Data Modeling

论文作者

Wambsganss, Thiemo, Swamy, Vinitra, Rietsche, Roman, Käser, Tanja

论文摘要

自然语言处理（NLP）已越来越多地用于提供教育应用的适应性。但是，最近的研究强调了预训练的语言模型中的各种偏见。尽管现有研究调查了不同领域的偏见，但它们在解决有关教育和多语言语料库的细粒度分析方面受到限制。在这项工作中，我们通过在五年内从大学生收集的9,165个德国同行评审的语料库中分析了跨文本的偏见和多个架构。值得注意的是，我们的语料库包括来自同行评审接收者以及人口统计属性的帮助，质量和关键方面等级等标签。我们对（1）与聚类标签有关的收集的语料库进行了单词嵌入关联测试（WEAT）分析，（2）最常见的预训练的德语模型（T5，BERT和GPT-2）和手套嵌入，以及（3）在我们收集的数据点上进行微调后的语言模型。与我们的最初期望相反，我们发现我们收集的语料库在共同出现分析或手套嵌入中没有揭示许多偏见。但是，预先训练的德语模型发现了实质性的概念，种族和性别偏见，并且在PEER-REVIEW数据上进行微调期间，概念和种族轴之间的偏见发生了重大变化。通过我们的研究，我们旨在通过新颖的数据集，对自然语言教育数据的偏见的理解以及不抵消语言模型中的教育任务偏见的潜在危害，为第四联合国可持续性目标（质量教育）做出贡献。

Natural Language Processing (NLP) has become increasingly utilized to provide adaptivity in educational applications. However, recent research has highlighted a variety of biases in pre-trained language models. While existing studies investigate bias in different domains, they are limited in addressing fine-grained analysis on educational and multilingual corpora. In this work, we analyze bias across text and through multiple architectures on a corpus of 9,165 German peer-reviews collected from university students over five years. Notably, our corpus includes labels such as helpfulness, quality, and critical aspect ratings from the peer-review recipient as well as demographic attributes. We conduct a Word Embedding Association Test (WEAT) analysis on (1) our collected corpus in connection with the clustered labels, (2) the most common pre-trained German language models (T5, BERT, and GPT-2) and GloVe embeddings, and (3) the language models after fine-tuning on our collected data-set. In contrast to our initial expectations, we found that our collected corpus does not reveal many biases in the co-occurrence analysis or in the GloVe embeddings. However, the pre-trained German language models find substantial conceptual, racial, and gender bias and have significant changes in bias across conceptual and racial axes during fine-tuning on the peer-review data. With our research, we aim to contribute to the fourth UN sustainability goal (quality education) with a novel dataset, an understanding of biases in natural language education data, and the potential harms of not counteracting biases in language models for educational tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题