论文标题

公平分类的无偏见选择:统一框架和可扩展算法

Unbiased Subdata Selection for Fair Classification: A Unified Framework and Scalable Algorithms

论文作者

Ye, Qing, Xie, Weijun

论文摘要

作为现代数据分析的重要问题,分类目睹了来自不同领域的应用。与常规分类方法不同,公平分类涉及对敏感特征(例如性别,种族)的无意偏见的问题。由于公平措施的高范围,现有方法通常无法建模确切的公平性,这可能会导致较低的公平分类结果。本文通过开发一个新颖的统一框架来共同优化准确性和公平性来填补空白。所提出的框架具有多功能性,可以精确地纳入文献中研究的不同公平措施,并且可以适用于许多分类器,包括深层分类模型。具体而言,在本文中,我们首先证明了拟议框架的Fisher一致性。然后,我们表明,该框架中的许多分类模型可以作为混合构成凸程序进行重新铸造,当实例大小适中时,可以通过现成的求解器有效地求解,并且可以用作基准测试,以比较近似算法的效率。我们证明,在提出的框架中,当已知分类结果时,被称为“无偏见的选择”的结果问题是强烈的多项式可溶性,可以通过选择更多代表性的数据点来增强分类公平。这促使我们制定了一种迭代精炼策略(IRS)来解决大规模实例,在那里我们以交替的方式提高了分类准确性并进行无偏见的子数据选择。我们研究IRS的收敛性能并得出其近似结合。更广泛地说,可以利用该框架来通过考虑F1分数来改善具有不平衡数据的分类模型。

As an important problem in modern data analytics, classification has witnessed varieties of applications from different domains. Different from conventional classification approaches, fair classification concerns the issues of unintentional biases against the sensitive features (e.g., gender, race). Due to high nonconvexity of fairness measures, existing methods are often unable to model exact fairness, which can cause inferior fair classification outcomes. This paper fills the gap by developing a novel unified framework to jointly optimize accuracy and fairness. The proposed framework is versatile and can incorporate different fairness measures studied in literature precisely as well as can be applicable to many classifiers including deep classification models. Specifically, in this paper, we first prove Fisher consistency of the proposed framework. We then show that many classification models within this framework can be recast as mixed-integer convex programs, which can be solved effectively by off-the-shelf solvers when the instance sizes are moderate and can be used as benchmarks to compare the efficiency of approximation algorithms. We prove that in the proposed framework, when the classification outcomes are known, the resulting problem, termed "unbiased subdata selection," is strongly polynomial-solvable and can be used to enhance the classification fairness by selecting more representative data points. This motivates us to develop an iterative refining strategy (IRS) to solve the large-scale instances, where we improve the classification accuracy and conduct the unbiased subdata selection in an alternating fashion. We study the convergence property of IRS and derive its approximation bound. More broadly, this framework can be leveraged to improve classification models with unbalanced data by taking F1 score into consideration.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源