长短期样品蒸馏

论文标题

长短期样品蒸馏

Long Short-Term Sample Distillation

论文作者

Jiang, Liang, Wen, Zujie, Liang, Zhongping, Wang, Yafang, de Melo, Gerard, Li, Zhe, Ma, Liangzhuang, Zhang, Jiaxing, Li, Xiaolong, Qi, Yuan

论文摘要

在过去的十年中，在培训日益深的神经网络方面取得了重大进展。教师的最新进展 - 学生培训范式已经确定有关过去培训更新的信息表明，有望在随后的培训步骤中作为指导的来源。基于这个概念，在本文中，我们提出了长期的短期样品蒸馏，这是一种新型的培训政策，同时利用了先前培训过程的多个阶段，以指导后来的培训更新到神经网络，同时仅在一代人的一代人中有效地进行。随着短期样品蒸馏的长时间，每个样品的监督信号分解为两个部分：长期信号和一个短期信号。这位长期的老师借鉴了几个时代以前的快照，以提供坚定的指导并保证老师的差异，而短期的差异则产生了更多最新线索，目的是实现更高质量的更新。此外，每个样本的老师都是独一无二的，因此，总体而言，该模型从非常多样化的老师那里学习。一系列视觉和NLP任务的全面实验结果证明了这种新训练方法的有效性。

In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher--student training paradigm have established that information about past training updates show promise as a source of guidance during subsequent training steps. Based on this notion, in this paper, we propose Long Short-Term Sample Distillation, a novel training policy that simultaneously leverages multiple phases of the previous training process to guide the later training updates to a neural network, while efficiently proceeding in just one single generation pass. With Long Short-Term Sample Distillation, the supervision signal for each sample is decomposed into two parts: a long-term signal and a short-term one. The long-term teacher draws on snapshots from several epochs ago in order to provide steadfast guidance and to guarantee teacher--student differences, while the short-term one yields more up-to-date cues with the goal of enabling higher-quality updates. Moreover, the teachers for each sample are unique, such that, overall, the model learns from a very diverse set of teachers. Comprehensive experimental results across a range of vision and NLP tasks demonstrate the effectiveness of this new training method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题