用嘈杂的标签进行学习的压缩功能

论文标题

用嘈杂的标签进行学习的压缩功能

Compressing Features for Learning with Noisy Labels

论文作者

Chen, Yingyi, Hu, Shell Xu, Shen, Xi, Ai, Chunrong, Suykens, Johan A. K.

论文摘要

可以将监督学习视为将相关信息从输入数据中提取到特征表示形式。当监督嘈杂时，此过程变得困难，因为蒸馏信息可能无关紧要。实际上，最近的研究表明，网络可以轻松地过度贴合所有标签，包括损坏的标签，因此几乎无法概括清洁数据集。在本文中，我们专注于嘈杂的标签学习问题，并将压缩归纳偏见引起网络体系结构以减轻这种过度拟合的问题。更确切地说，我们重新访问一个名为辍学的经典正则化及其变体嵌套掉落。辍学可以用作其功能删除机制的压缩约束，而嵌套的辍学者进一步学习有序特征表示W.R.T.特征重要性。此外，具有压缩正则化的训练有素的模型与共同教学相结合，以提高性能。从理论上讲，我们在压缩正则化下对目标函数进行偏置变化分解。我们对单个模型和共同教学进行了分析。该分解提供了三个见解：（i）表明过度拟合确实是用嘈杂标签学习的问题；（ii）通过信息瓶颈配方，它解释了为什么提出的特征压缩有助于对抗标签噪声；（iii）它通过将压缩正规化纳入共同教学带来的性能提升提供了解释。实验表明，我们的简单方法比具有现实世界标签噪声（包括服装1M和Animal-10n）的基准测试标准的最先进方法具有可比性甚至更好的性能。我们的实施可在https://yingyichen-cyy.github.io/compressfatsfeatnoisylabels/上获得。

Supervised learning can be viewed as distilling relevant information from input data into feature representations. This process becomes difficult when supervision is noisy as the distilled information might not be relevant. In fact, recent research shows that networks can easily overfit all labels including those that are corrupted, and hence can hardly generalize to clean datasets. In this paper, we focus on the problem of learning with noisy labels and introduce compression inductive bias to network architectures to alleviate this over-fitting problem. More precisely, we revisit one classical regularization named Dropout and its variant Nested Dropout. Dropout can serve as a compression constraint for its feature dropping mechanism, while Nested Dropout further learns ordered feature representations w.r.t. feature importance. Moreover, the trained models with compression regularization are further combined with Co-teaching for performance boost. Theoretically, we conduct bias-variance decomposition of the objective function under compression regularization. We analyze it for both single model and Co-teaching. This decomposition provides three insights: (i) it shows that over-fitting is indeed an issue for learning with noisy labels; (ii) through an information bottleneck formulation, it explains why the proposed feature compression helps in combating label noise; (iii) it gives explanations on the performance boost brought by incorporating compression regularization into Co-teaching. Experiments show that our simple approach can have comparable or even better performance than the state-of-the-art methods on benchmarks with real-world label noise including Clothing1M and ANIMAL-10N. Our implementation is available at https://yingyichen-cyy.github.io/CompressFeatNoisyLabels/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题