论文标题
道德对手:通过对抗机器学习缓解不公平性
Ethical Adversaries: Towards Mitigating Unfairness with Adversarial Machine Learning
论文作者
论文摘要
机器学习正在整合到越来越多的关键系统中,对社会产生了深远的影响。由于这种广泛使用及其理论考虑,意外行为和不公平的决策过程正在受到越来越多的审查。个人以及组织都会注意,测试并批评不公平的结果,以使模型设计师和部署者负责。我们提供了一个框架,可以帮助这些小组减轻培训数据集的不公平表示。我们的框架依靠两个互操作对手来提高公平性。首先,对模型进行了训练,目的是防止在限制公用事业损失的同时猜测受保护属性的值。第一步优化了模型的公平参数。其次,该框架利用了对抗机器学习的逃避攻击,以产生将错误分类的新示例。然后将这些新示例用于在第一步中重新训练和改进模型。这两个步骤迭代应用,直到获得公平性的显着提高为止。我们评估了公平文献(包括Compas)中研究良好的数据集的框架,在该数据集(包括Compas)可以超越有关人口统计学奇偶校验,机会平等以及模型效用的其他方法。我们还说明了缓解不公平性时微妙困难的发现,并突出了我们的框架如何帮助模型设计师。
Machine learning is being integrated into a growing number of critical systems with far-reaching impacts on society. Unexpected behaviour and unfair decision processes are coming under increasing scrutiny due to this widespread use and its theoretical considerations. Individuals, as well as organisations, notice, test, and criticize unfair results to hold model designers and deployers accountable. We offer a framework that assists these groups in mitigating unfair representations stemming from the training datasets. Our framework relies on two inter-operating adversaries to improve fairness. First, a model is trained with the goal of preventing the guessing of protected attributes' values while limiting utility losses. This first step optimizes the model's parameters for fairness. Second, the framework leverages evasion attacks from adversarial machine learning to generate new examples that will be misclassified. These new examples are then used to retrain and improve the model in the first step. These two steps are iteratively applied until a significant improvement in fairness is obtained. We evaluated our framework on well-studied datasets in the fairness literature -- including COMPAS -- where it can surpass other approaches concerning demographic parity, equality of opportunity and also the model's utility. We also illustrate our findings on the subtle difficulties when mitigating unfairness and highlight how our framework can assist model designers.