论文标题
长短期样品蒸馏
Long Short-Term Sample Distillation
论文作者
论文摘要
在过去的十年中,在培训日益深的神经网络方面取得了重大进展。教师的最新进展 - 学生培训范式已经确定有关过去培训更新的信息表明,有望在随后的培训步骤中作为指导的来源。基于这个概念,在本文中,我们提出了长期的短期样品蒸馏,这是一种新型的培训政策,同时利用了先前培训过程的多个阶段,以指导后来的培训更新到神经网络,同时仅在一代人的一代人中有效地进行。随着短期样品蒸馏的长时间,每个样品的监督信号分解为两个部分:长期信号和一个短期信号。这位长期的老师借鉴了几个时代以前的快照,以提供坚定的指导并保证老师的差异,而短期的差异则产生了更多最新线索,目的是实现更高质量的更新。此外,每个样本的老师都是独一无二的,因此,总体而言,该模型从非常多样化的老师那里学习。一系列视觉和NLP任务的全面实验结果证明了这种新训练方法的有效性。
In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher--student training paradigm have established that information about past training updates show promise as a source of guidance during subsequent training steps. Based on this notion, in this paper, we propose Long Short-Term Sample Distillation, a novel training policy that simultaneously leverages multiple phases of the previous training process to guide the later training updates to a neural network, while efficiently proceeding in just one single generation pass. With Long Short-Term Sample Distillation, the supervision signal for each sample is decomposed into two parts: a long-term signal and a short-term one. The long-term teacher draws on snapshots from several epochs ago in order to provide steadfast guidance and to guarantee teacher--student differences, while the short-term one yields more up-to-date cues with the goal of enabling higher-quality updates. Moreover, the teachers for each sample are unique, such that, overall, the model learns from a very diverse set of teachers. Comprehensive experimental results across a range of vision and NLP tasks demonstrate the effectiveness of this new training method.