从嘈杂的类似和不同的数据中学习

论文标题

从嘈杂的类似和不同的数据中学习

Learning from Noisy Similar and Dissimilar Data

论文作者

Dan, Soham, Bao, Han, Sugiyama, Masashi

论文摘要

随着机器学习对分类的广泛使用，能够使用较弱的监督来完成很难获得标准标记数据的任务变得越来越重要。以相似（S）对的形式（如果两个示例属于同一类）和不同的（d）对（如果两个示例属于不同类别）的形式（如果有两个示例属于同一类），则提供了一种这样的监督。这种监督在对隐私敏感的领域中是现实的。尽管最近已经研究了这个问题，但尚不清楚如何在标签噪声下从这种监督中学习，这在监督被人群时很普遍。在本文中，我们缩小了这一差距，并演示了如何从嘈杂的S和D标记的数据中学习分类器。我们在两个现实的噪声模型下对此问题进行了详细的研究，并提出了两种算法，以从嘈杂的S-D数据中学习。我们还显示了从此类成对监督数据学习与从普通类标记的数据中学习之间的重要联系。最后，我们对合成和现实世界数据集进行了实验，并显示我们的噪声信息算法在从嘈杂的成对数据中学习时优于噪声盲基线。

With the widespread use of machine learning for classification, it becomes increasingly important to be able to use weaker kinds of supervision for tasks in which it is hard to obtain standard labeled data. One such kind of supervision is provided pairwise---in the form of Similar (S) pairs (if two examples belong to the same class) and Dissimilar (D) pairs (if two examples belong to different classes). This kind of supervision is realistic in privacy-sensitive domains. Although this problem has been looked at recently, it is unclear how to learn from such supervision under label noise, which is very common when the supervision is crowd-sourced. In this paper, we close this gap and demonstrate how to learn a classifier from noisy S and D labeled data. We perform a detailed investigation of this problem under two realistic noise models and propose two algorithms to learn from noisy S-D data. We also show important connections between learning from such pairwise supervision data and learning from ordinary class-labeled data. Finally, we perform experiments on synthetic and real world datasets and show our noise-informed algorithms outperform noise-blind baselines in learning from noisy pairwise data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题