论文标题
通过明确性和对抗性塑形学习的公平性
Fairness by Explicability and Adversarial SHAP Learning
论文作者
论文摘要
理解和信任模型预测的公平性的能力,尤其是在考虑非特权群体的结果时,对于机器学习系统的部署和采用至关重要。 Shap值为解释模型预测和特征归因提供了一个统一的框架,但不能直接解决公平问题。在这项工作中,我们提出了一个新的公平定义,该定义强调了外部审计师和模型可阐明性的作用。为了满足这一定义,我们使用从对抗性替代模型的外形值构建的正规化来开发一个框架来减轻模型偏差。我们专注于单个无私人组的二进制分类任务,并将我们的公平性明确约束链接到经典的统计公平度指标。我们使用梯度和自适应提升来证明我们的方法:合成数据集,UCI成人(人口普查)数据集和现实世界中的信用评分数据集。产生的模型更公平,性能。
The ability to understand and trust the fairness of model predictions, particularly when considering the outcomes of unprivileged groups, is critical to the deployment and adoption of machine learning systems. SHAP values provide a unified framework for interpreting model predictions and feature attribution but do not address the problem of fairness directly. In this work, we propose a new definition of fairness that emphasises the role of an external auditor and model explicability. To satisfy this definition, we develop a framework for mitigating model bias using regularizations constructed from the SHAP values of an adversarial surrogate model. We focus on the binary classification task with a single unprivileged group and link our fairness explicability constraints to classical statistical fairness metrics. We demonstrate our approaches using gradient and adaptive boosting on: a synthetic dataset, the UCI Adult (Census) dataset and a real-world credit scoring dataset. The models produced were fairer and performant.