通过密集的跨层互助转移知识转移

论文标题

通过密集的跨层互助转移知识转移

Knowledge Transfer via Dense Cross-Layer Mutual-Distillation

论文作者

Yao, Anbang, Sun, Dawei

论文摘要

基于知识蒸馏（KD）的方法采用单向知识转移（KT）方案，在该方案中，培训较低的学生网络由预先训练的高容量的教师网络指导。最近，深入的相互学习（DML）提出了双向KT策略，表明学生网络也有助于改善教师网络。在本文中，我们提出了密集的跨层互助（DCM），这是一种改进的双向KT方法，其中教师和学生网络从Scratch进行了协作培训。为了增强知识表示学习，精心设计的辅助分类器被添加到教师和学生网络的某些隐藏层中。为了提高KT性能，我们在附加分类器的层之间引入了密集的双向KD操作。训练后，所有辅助分类器都将被丢弃，因此最终模型没有任何额外的参数。我们在各种KT任务上测试我们的方法，显示了其优于相关方法的优势。代码可从https://github.com/sundw2014/dcm获得

Knowledge Distillation (KD) based methods adopt the one-way Knowledge Transfer (KT) scheme in which training a lower-capacity student network is guided by a pre-trained high-capacity teacher network. Recently, Deep Mutual Learning (DML) presented a two-way KT strategy, showing that the student network can be also helpful to improve the teacher network. In this paper, we propose Dense Cross-layer Mutual-distillation (DCM), an improved two-way KT method in which the teacher and student networks are trained collaboratively from scratch. To augment knowledge representation learning, well-designed auxiliary classifiers are added to certain hidden layers of both teacher and student networks. To boost KT performance, we introduce dense bidirectional KD operations between the layers appended with classifiers. After training, all auxiliary classifiers are discarded, and thus there are no extra parameters introduced to final models. We test our method on a variety of KT tasks, showing its superiorities over related methods. Code is available at https://github.com/sundw2014/DCM

下载PDF全文

下载文献需遵守相关版权规定

论文标题