论文标题
化学指定实体识别模型中性别偏见的全面研究
A Comprehensive Study of Gender Bias in Chemical Named Entity Recognition Models
论文作者
论文摘要
化学指定实体识别(NER)模型在许多下游任务中使用,从不良药物反应鉴定到药物ePidemiology。但是,尚不清楚这些模型是否适合每个人。绩效差异可能会造成伤害,而不是预期的好处。本文评估了化学NER系统中与性别相关的性能差异。我们开发了一个框架,用于使用合成数据和92,405个单词的新注释语料库中的化学模型中的性别偏见,并从Reddit中提供自我识别的性别信息。我们对多种生物医学模型的评估揭示了明显的偏见。例如,合成数据表明与女性相关的名称经常被错误分类为化学药品,尤其是对于品牌名称提及。此外,我们观察到两个数据集中女性和男性相关数据之间的性能差异。许多系统无法检测到避孕药,例如节育。我们的发现强调了化学模型中的偏见,敦促从业者在下游应用中解释这些偏见。
Chemical named entity recognition (NER) models are used in many downstream tasks, from adverse drug reaction identification to pharmacoepidemiology. However, it is unknown whether these models work the same for everyone. Performance disparities can potentially cause harm rather than the intended good. This paper assesses gender-related performance disparities in chemical NER systems. We develop a framework for measuring gender bias in chemical NER models using synthetic data and a newly annotated corpus of over 92,405 words with self-identified gender information from Reddit. Our evaluation of multiple biomedical NER models reveals evident biases. For instance, synthetic data suggests female-related names are frequently misclassified as chemicals, especially for brand name mentions. Additionally, we observe performance disparities between female- and male-associated data in both datasets. Many systems fail to detect contraceptives such as birth control. Our findings emphasize the biases in chemical NER models, urging practitioners to account for these biases in downstream applications.