从自然失衡的伪标签中学习

论文标题

从自然失衡的伪标签中学习

Debiased Learning from Naturally Imbalanced Pseudo-Labels

论文作者

Wang, Xudong, Wu, Zhirong, Lian, Long, Yu, Stella X.

论文摘要

伪标签是对未标记的目标数据对未标记的目标数据做出的自信预测。它们被广泛用于调整模型来无标记的数据，例如在半监督的学习设置中。我们的关键见解是，即使在平衡的源数据上训练模型并在平衡的目标数据上进行了评估，伪标签由于固有数据的相似性而自然失衡。如果我们解决了以前未知的不平衡分类问题，那是由伪标签而不是基地真相训练标签引起的，我们可以消除对伪标签产生的虚假多数的模型偏见。我们提出了一种基于反事实推理和适应性边缘的伪标记的新颖有效的学习方法：前者消除了分类器响应偏见，而后者根据伪标记的失衡来调整每个类别的边缘。通过广泛的实验验证，我们的简单辩护学习可在Imagenet-1k上的最新时间表：半监督学习的26％的准确性提高，并获得0.2％的注释，而零声学习为9％。我们的代码可在以下网址获得：https：//github.com/frank-xwang/debiased-pseudo-labeling。

Pseudo-labels are confident predictions made on unlabeled target data by a classifier trained on labeled source data. They are widely used for adapting a model to unlabeled data, e.g., in a semi-supervised learning setting. Our key insight is that pseudo-labels are naturally imbalanced due to intrinsic data similarity, even when a model is trained on balanced source data and evaluated on balanced target data. If we address this previously unknown imbalanced classification problem arising from pseudo-labels instead of ground-truth training labels, we could remove model biases towards false majorities created by pseudo-labels. We propose a novel and effective debiased learning method with pseudo-labels, based on counterfactual reasoning and adaptive margins: The former removes the classifier response bias, whereas the latter adjusts the margin of each class according to the imbalance of pseudo-labels. Validated by extensive experimentation, our simple debiased learning delivers significant accuracy gains over the state-of-the-art on ImageNet-1K: 26% for semi-supervised learning with 0.2% annotations and 9% for zero-shot learning. Our code is available at: https://github.com/frank-xwang/debiased-pseudo-labeling.

下载PDF全文

下载文献需遵守相关版权规定

论文标题