迈向机器学习中偏见的整体观点：桥接算法公平和不平衡学习

论文标题

迈向机器学习中偏见的整体观点：桥接算法公平和不平衡学习

Towards A Holistic View of Bias in Machine Learning: Bridging Algorithmic Fairness and Imbalanced Learning

论文作者

Dablain, Damien, Krawczyk, Bartosz, Chawla, Nitesh

论文摘要

机器学习（ML）在渲染影响社会各个群体的决策中起着越来越重要的作用。 ML模型为刑事司法的决定，银行业中的信贷延长以及公司的招聘做法提供了信息。这提出了模型公平性的要求，这表明自动化的决策对于受保护的特征（例如性别，种族或年龄）应该是公平的，这些特征通常在数据中代表不足。我们假设这个代表性不足的问题是数据学习不平衡问题的必然性。此类不平衡通常反映在两个类别和受保护的功能中。例如，一个班级（那些获得信用的班级）可能相对于另一个班级（未获得信用的阶级）代表过多，而特定组（女性）（女性）的代表可能与另一组（男性）有关。在受保护组方面实现算法公平性的关键要素是同时减少了基础培训数据中的类和受保护的组失衡，这促进了模型准确性和公平性的提高。我们通过展示这些领域中的关键概念如何重叠和相互补充，讨论弥合失衡学习和群体公平的重要性；并提出了一种新颖的过采样算法，即公平的过采样，该算法既解决偏斜的班级分布和受保护的特征。我们的方法：（i）可以用作标准ML算法的有效预处理算法，以共同解决不平衡和群体权益；（ii）可以与公平感知的学习算法结合使用，以提高其对不同水平不平衡水平的稳健性。此外，我们迈出了一步，将公平和不平衡学习之间的差距与新的公平实用程序之间的差距弥合，从而将平衡的准确性与公平性结合在一起。

Machine learning (ML) is playing an increasingly important role in rendering decisions that affect a broad range of groups in society. ML models inform decisions in criminal justice, the extension of credit in banking, and the hiring practices of corporations. This posits the requirement of model fairness, which holds that automated decisions should be equitable with respect to protected features (e.g., gender, race, or age) that are often under-represented in the data. We postulate that this problem of under-representation has a corollary to the problem of imbalanced data learning. This class imbalance is often reflected in both classes and protected features. For example, one class (those receiving credit) may be over-represented with respect to another class (those not receiving credit) and a particular group (females) may be under-represented with respect to another group (males). A key element in achieving algorithmic fairness with respect to protected groups is the simultaneous reduction of class and protected group imbalance in the underlying training data, which facilitates increases in both model accuracy and fairness. We discuss the importance of bridging imbalanced learning and group fairness by showing how key concepts in these fields overlap and complement each other; and propose a novel oversampling algorithm, Fair Oversampling, that addresses both skewed class distributions and protected features. Our method: (i) can be used as an efficient pre-processing algorithm for standard ML algorithms to jointly address imbalance and group equity; and (ii) can be combined with fairness-aware learning algorithms to improve their robustness to varying levels of class imbalance. Additionally, we take a step toward bridging the gap between fairness and imbalanced learning with a new metric, Fair Utility, that combines balanced accuracy with fairness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题