论文标题
发生了什么变化?使用因果中介分析调查辩护方法
What Changed? Investigating Debiasing Methods using Causal Mediation Analysis
论文作者
论文摘要
先前的工作已经研究了偏见的语言模型如何影响下游任务,具体来说,辩解技巧如何影响任务绩效以及借鉴模型是否也对下游任务做出公正的预测。但是,我们还不理解的是,为什么借鉴方法会对下游任务产生不同的影响,以及借鉴技术如何影响语言模型的内部组件,即神经元,层次和注意力。在本文中,我们通过应用因果中介分析来理解词汇方法对毒性检测作为下游任务的影响,从而分解了有关性别方面语言模型的内部机制。我们的发现表明,需要测试具有不同偏见指标的偏数方法的有效性,并专注于模型某些组成部分的行为的变化,例如,语言模型的前两层和注意力头。
Previous work has examined how debiasing language models affect downstream tasks, specifically, how debiasing techniques influence task performance and whether debiased models also make impartial predictions in downstream tasks or not. However, what we don't understand well yet is why debiasing methods have varying impacts on downstream tasks and how debiasing techniques affect internal components of language models, i.e., neurons, layers, and attentions. In this paper, we decompose the internal mechanisms of debiasing language models with respect to gender by applying causal mediation analysis to understand the influence of debiasing methods on toxicity detection as a downstream task. Our findings suggest a need to test the effectiveness of debiasing methods with different bias metrics, and to focus on changes in the behavior of certain components of the models, e.g.,first two layers of language models, and attention heads.