通过自适应实例归一化的知识蒸馏

论文标题

通过自适应实例归一化的知识蒸馏

Knowledge distillation via adaptive instance normalization

论文作者

Yang, Jing, Martinez, Brais, Bulat, Adrian, Tzimiropoulos, Georgios

论文摘要

本文通过知识蒸馏解决了模型压缩的问题。为此，我们提出了一种基于传输特征统计数据的新知识蒸馏方法，特别是渠道的平均值和差异，从老师到学生。我们的方法超出了通过$ l_2 $损失与教师相似的标准方式，我们发现这是有限的有效性。具体而言，我们提出了基于自适应实例归一化的新损失，以有效地传输特征统计信息。主要思想是通过自适应实例归一化（以学生为条件）将学习的统计数据转移回老师，并通过损失使教师网络“评估”学生是否可靠地转移了学生所学的统计数据。我们表明，我们的蒸馏方法在包括不同的（a）网络体系结构，（b）教师学生的能力，（c）数据集和（d）域的大量实验设置上优于其他最先进的蒸馏方法。

This paper addresses the problem of model compression via knowledge distillation. To this end, we propose a new knowledge distillation method based on transferring feature statistics, specifically the channel-wise mean and variance, from the teacher to the student. Our method goes beyond the standard way of enforcing the mean and variance of the student to be similar to those of the teacher through an $L_2$ loss, which we found it to be of limited effectiveness. Specifically, we propose a new loss based on adaptive instance normalization to effectively transfer the feature statistics. The main idea is to transfer the learned statistics back to the teacher via adaptive instance normalization (conditioned on the student) and let the teacher network "evaluate" via a loss whether the statistics learned by the student are reliably transferred. We show that our distillation method outperforms other state-of-the-art distillation methods over a large set of experimental settings including different (a) network architectures, (b) teacher-student capacities, (c) datasets, and (d) domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题