论文标题
不要把它扔掉!公平决策中未标记数据的效用
Don't Throw it Away! The Utility of Unlabeled Data in Fair Decision Making
论文作者
论文摘要
实际上,决策算法通常是对表现出各种偏见的数据进行培训。决策者通常旨在根据假定或期望公正的基础真相目标做出决策,即同样分布在社会显着的群体中。在许多实际设置中,无法直接观察到地面真相,而是必须依靠数据中的基础真相(即有偏见的标签)的有偏见的代理。此外,通常会选择性地标记数据,即,即使是有偏见的标签也仅对获得积极决策的数据的一小部分观察到。为了克服标签和选择偏见,最近的工作提议学习随机性,通过i)在每个时间步长的在线培训新政策,ii)执行公平性作为对绩效的限制。但是,现有方法仅使用标记的数据,无视大量未标记的数据,因此在不同时间学习的决策策略的不稳定性和差异很大。在本文中,我们提出了一种基于一种实用公平决策的各种自动编码器的新方法。我们的方法学习了利用标记和未标记数据的公正数据表示形式,并使用表示形式在在线过程中学习策略。使用合成数据,我们从经验上验证我们的方法根据差异较低的地面真相会收敛到最佳(公平)策略。在现实世界实验中,我们进一步表明,我们的培训方法不仅提供了更稳定的学习过程,而且还产生了比以前的方法更公平和效用的政策。
Decision making algorithms, in practice, are often trained on data that exhibits a variety of biases. Decision-makers often aim to take decisions based on some ground-truth target that is assumed or expected to be unbiased, i.e., equally distributed across socially salient groups. In many practical settings, the ground-truth cannot be directly observed, and instead, we have to rely on a biased proxy measure of the ground-truth, i.e., biased labels, in the data. In addition, data is often selectively labeled, i.e., even the biased labels are only observed for a small fraction of the data that received a positive decision. To overcome label and selection biases, recent work proposes to learn stochastic, exploring decision policies via i) online training of new policies at each time-step and ii) enforcing fairness as a constraint on performance. However, the existing approach uses only labeled data, disregarding a large amount of unlabeled data, and thereby suffers from high instability and variance in the learned decision policies at different times. In this paper, we propose a novel method based on a variational autoencoder for practical fair decision-making. Our method learns an unbiased data representation leveraging both labeled and unlabeled data and uses the representations to learn a policy in an online process. Using synthetic data, we empirically validate that our method converges to the optimal (fair) policy according to the ground-truth with low variance. In real-world experiments, we further show that our training approach not only offers a more stable learning process but also yields policies with higher fairness as well as utility than previous approaches.