评估和减轻图像分类器中的偏差：使用反事实的因果观点

论文标题

评估和减轻图像分类器中的偏差：使用反事实的因果观点

Evaluating and Mitigating Bias in Image Classifiers: A Causal Perspective Using Counterfactuals

论文作者

Dash, Saloni, Balasubramanian, Vineeth N, Sharma, Amit

论文摘要

事实证明，输入的反事实示例（改变特定特征但没有改变特定特征的扰动）已被证明可用于评估机器学习模型的偏差，例如针对特定的人口统计组。但是，由于图像的各种特征的基本因果结构，为图像生成反事实示例是非平凡的。为了有意义，生成的扰动需要满足因果模型所隐含的约束。我们提出了一种通过将结构性因果模型（SCM）纳入对抗性掌握的推断（ALI）的变体中来生成反事实的方法，该推断（ALI）会根据图像属性之间的因果关系产生反事实。基于生成的反事实，我们展示了如何解释预训练的机器学习分类器，评估其偏见并使用反事实正常化程序来减轻偏见。在Morpho-Mnist数据集上，我们的方法生成的反事实与基于SCM的反事实（DEEPSCM）相当的反事实是可比的，而在更复杂的Celeba数据集中，我们的方法在生成高质量有效的反对性方面均优于DeepScm。此外，在人类评估实验中，生成的反事实与重建的图像没有区别，我们随后使用它们来评估接受Celeba数据的标准分类器的公平性。我们证明分类器是偏见的W.R.T.皮肤和头发的颜色，以及反事实的正则化如何消除这些偏见。

Counterfactual examples for an input -- perturbations that change specific features but not others -- have been shown to be useful for evaluating bias of machine learning models, e.g., against specific demographic groups. However, generating counterfactual examples for images is non-trivial due to the underlying causal structure on the various features of an image. To be meaningful, generated perturbations need to satisfy constraints implied by the causal model. We present a method for generating counterfactuals by incorporating a structural causal model (SCM) in an improved variant of Adversarially Learned Inference (ALI), that generates counterfactuals in accordance with the causal relationships between attributes of an image. Based on the generated counterfactuals, we show how to explain a pre-trained machine learning classifier, evaluate its bias, and mitigate the bias using a counterfactual regularizer. On the Morpho-MNIST dataset, our method generates counterfactuals comparable in quality to prior work on SCM-based counterfactuals (DeepSCM), while on the more complex CelebA dataset our method outperforms DeepSCM in generating high-quality valid counterfactuals. Moreover, generated counterfactuals are indistinguishable from reconstructed images in a human evaluation experiment and we subsequently use them to evaluate the fairness of a standard classifier trained on CelebA data. We show that the classifier is biased w.r.t. skin and hair color, and how counterfactual regularization can remove those biases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题