Bregman代表学习，并应用于知识蒸馏

论文标题

Bregman代表学习，并应用于知识蒸馏

Layerwise Bregman Representation Learning with Applications to Knowledge Distillation

论文作者

Amid, Ehsan, Anil, Rohan, Fifty, Christopher, Warmuth, Manfred K.

论文摘要

在这项工作中，我们提出了一种新颖的方法，用于对训练有素的神经网络学习。特别是，我们根据层的传输函数形成了布雷格曼的差异，并通过合并平均向量并将主方向归一化，并构造原始的布雷格曼pca公式的扩展，并将主要方向归一化。这种概括允许将学习的表示形式导出为具有非线性的固定层。作为知识蒸馏的应用，我们为学生网络的学习问题提出了预测教师表示的压缩系数，这些内容被作为输入到导入层的输入。我们的经验发现表明，与使用教师的倒数第二层表示和软标签相比，与典型的教师培训相比，我们的方法在网络之间传输信息更为有效。

In this work, we propose a novel approach for layerwise representation learning of a trained neural network. In particular, we form a Bregman divergence based on the layer's transfer function and construct an extension of the original Bregman PCA formulation by incorporating a mean vector and normalizing the principal directions with respect to the geometry of the local convex function around the mean. This generalization allows exporting the learned representation as a fixed layer with a non-linearity. As an application to knowledge distillation, we cast the learning problem for the student network as predicting the compression coefficients of the teacher's representations, which are passed as the input to the imported layer. Our empirical findings indicate that our approach is substantially more effective for transferring information between networks than typical teacher-student training using the teacher's penultimate layer representations and soft labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题