论文标题
数据竞争:购买数据如何影响用户?
Competition over data: how does data purchase affect users?
论文作者
论文摘要
随着机器学习(ML)由许多竞争服务提供商部署,基本的ML预测因子也相互竞争,了解此类竞争的影响和偏见越来越重要。在本文中,我们研究竞争预测因素可以获取其他标记数据以提高其预测质量时会发生什么。我们介绍了一个新的环境,该环境使ML预测变量可以使用主动学习算法在预算内购买标记的数据,同时相互竞争以吸引用户。我们的环境模型在竞争系统中数据获取的关键方面,该系统以前从未经过深入研究。我们发现,当预测变量可以购买其他标记的数据时,ML预测变量的总体性能会提高。但是,令人惊讶的是,即使单个预测因素变得更好,用户所体验的质量(即每个用户选择的预测变量的准确性)也可以降低。我们表明,这种现象自然是由于权衡取舍,竞争使每个预测指标都专门研究人口的一部分,而数据购买的效果使预测变量更加统一。我们通过实验和理论来支持我们的发现。
As machine learning (ML) is deployed by many competing service providers, the underlying ML predictors also compete against each other, and it is increasingly important to understand the impacts and biases from such competition. In this paper, we study what happens when the competing predictors can acquire additional labeled data to improve their prediction quality. We introduce a new environment that allows ML predictors to use active learning algorithms to purchase labeled data within their budgets while competing against each other to attract users. Our environment models a critical aspect of data acquisition in competing systems which has not been well-studied before. We found that the overall performance of an ML predictor improves when predictors can purchase additional labeled data. Surprisingly, however, the quality that users experience -- i.e. the accuracy of the predictor selected by each user -- can decrease even as the individual predictors get better. We show that this phenomenon naturally arises due to a trade-off whereby competition pushes each predictor to specialize in a subset of the population while data purchase has the effect of making predictors more uniform. We support our findings with both experiments and theories.