超越猫和狗：用超集成的模糊标签的半监督分类

论文标题

超越猫和狗：用超集成的模糊标签的半监督分类

Beyond Cats and Dogs: Semi-supervised Classification of fuzzy labels with overclustering

论文作者

Schmarje, Lars, Brünger, Johannes, Santarossa, Monty, Schröder, Simon-Martin, Kiko, Rainer, Koch, Reinhard

论文摘要

深度学习的一个长期存在的问题是需要大型且始终如一的标签数据集。尽管当前在半监督学习中的研究可以将所需的注释数据量减少10倍甚至更多，但这一研究仍然使用猫和狗等不同类别。但是，在现实世界中，我们经常遇到不同的专家有不同意见，从而产生模糊标签的问题。我们提出了一个新颖的框架，用于处理此类模糊标签的半监督分类。我们的框架是基于超集群以检测这些模糊标签中的子结构的想法。我们提出了一种新颖的损失，以提高框架的超集成能力，并在常见的图像分类数据集STL-10上显示它比以前的工作更快且具有更好的超集群性能。在现实世界中的浮游生物数据集上，我们说明了模糊标签超集群的好处，并表明我们击败了先前最先进的半佩斯特式方法。此外，我们获得了5％至10％的子结构预测。

A long-standing issue with deep learning is the need for large and consistently labeled datasets. Although the current research in semi-supervised learning can decrease the required amount of annotated data by a factor of 10 or even more, this line of research still uses distinct classes like cats and dogs. However, in the real-world we often encounter problems where different experts have different opinions, thus producing fuzzy labels. We propose a novel framework for handling semi-supervised classifications of such fuzzy labels. Our framework is based on the idea of overclustering to detect substructures in these fuzzy labels. We propose a novel loss to improve the overclustering capability of our framework and show on the common image classification dataset STL-10 that it is faster and has better overclustering performance than previous work. On a real-world plankton dataset, we illustrate the benefit of overclustering for fuzzy labels and show that we beat previous state-of-the-art semisupervised methods. Moreover, we acquire 5 to 10% more consistent predictions of substructures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题