论文标题
关于使用人类审计师的二进制反馈学习和执行潜在评估模型,
On Learning and Enforcing Latent Assessment Models using Binary Feedback from Human Auditors Regarding Black-Box Classifiers
论文作者
论文摘要
算法公平文献提供了许多数学概念和指标,同时同时使其中一些或全部满足它们之间的折衷。此外,公平概念的上下文性质使在不同算法系统中自动化偏见评估变得困难。因此,在本文中,我们提出了一个称为潜在评估模型(LAM)的新型模型来表征人类审计师提供的二进制反馈,假设审计师将分类器的输出与对每个输入的本质判断进行比较。我们证明,只要审计师的内在判断固有地满足当前的公平概念,并且与分类器的评估相对相似,就可以保证个人和团体公平的概念。我们还证明了三个著名数据集(即Compas,德国信贷和成人人口普查收入数据集)上LAM与传统公平概念之间的这种关系。此外,我们还得出了获得PAC学习保证所需的最少反馈样本数量,以估算黑盒分类器的LAM。这些保证还通过培训标准机器学习算法对400名人类审计师引起的有关Compas的实际二进制反馈进行验证。
Algorithmic fairness literature presents numerous mathematical notions and metrics, and also points to a tradeoff between them while satisficing some or all of them simultaneously. Furthermore, the contextual nature of fairness notions makes it difficult to automate bias evaluation in diverse algorithmic systems. Therefore, in this paper, we propose a novel model called latent assessment model (LAM) to characterize binary feedback provided by human auditors, by assuming that the auditor compares the classifier's output to his or her own intrinsic judgment for each input. We prove that individual and group fairness notions are guaranteed as long as the auditor's intrinsic judgments inherently satisfy the fairness notion at hand, and are relatively similar to the classifier's evaluations. We also demonstrate this relationship between LAM and traditional fairness notions on three well-known datasets, namely COMPAS, German credit and Adult Census Income datasets. Furthermore, we also derive the minimum number of feedback samples needed to obtain PAC learning guarantees to estimate LAM for black-box classifiers. These guarantees are also validated via training standard machine learning algorithms on real binary feedback elicited from 400 human auditors regarding COMPAS.