论文标题
通过自我知识蒸馏正规化班级预测
Regularizing Class-wise Predictions via Self-knowledge Distillation
论文作者
论文摘要
由于过度拟合,具有数百万个参数的深度神经网络可能会遭受泛化的差。为了减轻问题,我们提出了一种新的正则化方法,以惩罚类似样本之间的预测分布。特别是,我们在训练过程中提炼同一标签的不同样品之间的预测分布。这会导致单个网络(即自我知识蒸馏)的黑暗知识(即对错误预测的知识)正规化,以迫使其以班级方式产生更有意义,更一致的预测。因此,它可以减轻过度自信的预测并减少阶层内变化。我们对各种图像分类任务的实验结果表明,简单而强大的方法不仅可以显着提高概括能力,而且可以改善现代卷积神经网络的校准性能。
Deep neural networks with millions of parameters may suffer from poor generalization due to overfitting. To mitigate the issue, we propose a new regularization method that penalizes the predictive distribution between similar samples. In particular, we distill the predictive distribution between different samples of the same label during training. This results in regularizing the dark knowledge (i.e., the knowledge on wrong predictions) of a single network (i.e., a self-knowledge distillation) by forcing it to produce more meaningful and consistent predictions in a class-wise manner. Consequently, it mitigates overconfident predictions and reduces intra-class variations. Our experimental results on various image classification tasks demonstrate that the simple yet powerful method can significantly improve not only the generalization ability but also the calibration performance of modern convolutional neural networks.