跨贝叶斯定义众包预测

论文标题

跨贝叶斯定义众包预测

Variational Bayesian Inference for Crowdsourcing Predictions

论文作者

Cai, Desmond, Nguyen, Duc Thien, Lim, Shiau Hong, Wynter, Laura

论文摘要

众包已成为执行许多机器学习任务的有效手段，例如图像和其他数据集的注释和标记。在众包的大多数早期设置中，任务涉及分类，这是为每个任务分配一组分散标签之一。但是，最近，尝试了更复杂的任务，包括要求众包工人分配连续标签或预测。本质上，这涉及将众包用于功能估计。我们受到这个问题的激励，以推动诸如协作预测之类的应用程序，即利用人群的智慧，以更准确地预测数量。为此，我们提出了一种旨在减轻过度拟合的贝叶斯方法，这是实践中准确预测模型的典型障碍。特别是，我们针对两个不同的工人噪声模型开发了一种差异贝叶斯技术 - 一种假设工人的噪声是独立的，而另一个假设工人的声音具有潜在的低级结构。我们对合成和现实世界数据集的评估表明，这些贝叶斯方法的表现明显优于现有的非bayesian方法，因此对于这类众包问题可能有用。

Crowdsourcing has emerged as an effective means for performing a number of machine learning tasks such as annotation and labelling of images and other data sets. In most early settings of crowdsourcing, the task involved classification, that is assigning one of a discrete set of labels to each task. Recently, however, more complex tasks have been attempted including asking crowdsource workers to assign continuous labels, or predictions. In essence, this involves the use of crowdsourcing for function estimation. We are motivated by this problem to drive applications such as collaborative prediction, that is, harnessing the wisdom of the crowd to predict quantities more accurately. To do so, we propose a Bayesian approach aimed specifically at alleviating overfitting, a typical impediment to accurate prediction models in practice. In particular, we develop a variational Bayesian technique for two different worker noise models - one that assumes workers' noises are independent and the other that assumes workers' noises have a latent low-rank structure. Our evaluations on synthetic and real-world datasets demonstrate that these Bayesian approaches perform significantly better than existing non-Bayesian approaches and are thus potentially useful for this class of crowdsourcing problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题