论文标题
无效:通过迭代nullspace投影保护受保护的属性
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
论文作者
论文摘要
控制神经表示中编码的信息种类的能力具有多种用例,尤其是鉴于解释这些模型的挑战。我们介绍了迭代空位投影(INLP),这是一种从神经表示中删除信息的新方法。我们的方法基于对线性分类器的重复培训,这些培训可以预测我们旨在消除的某种属性,然后预测其空空间上的表示形式。通过这样做,分类器变得忽略了该目标属性,因此很难根据该属性线性地分离数据。虽然适用于多种用途,但我们评估了有关偏见和公平用例的方法,并证明我们的方法能够减轻单词嵌入的偏见,并在多类分类的环境中提高公平性。
The ability to control for the kinds of information encoded in neural representation has a variety of use cases, especially in light of the challenge of interpreting these models. We present Iterative Null-space Projection (INLP), a novel method for removing information from neural representations. Our method is based on repeated training of linear classifiers that predict a certain property we aim to remove, followed by projection of the representations on their null-space. By doing so, the classifiers become oblivious to that target property, making it hard to linearly separate the data according to it. While applicable for multiple uses, we evaluate our method on bias and fairness use-cases, and show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.