论文标题

一切都在头上:代表性知识蒸馏通过分类器共享

It's All in the Head: Representation Knowledge Distillation through Classifier Sharing

论文作者

Ben-Baruch, Emanuel, Karklinsky, Matan, Biton, Yossi, Ben-Cohen, Avi, Lawen, Hussam, Zamir, Nadav

论文摘要

表示知识蒸馏旨在将丰富的信息从一个模型转移到另一种模型。表示蒸馏的常见方法主要集中于模型嵌入向量之间距离指标的直接最小化。这种直接方法可能会受到限制,即传递嵌入在表示向量中的高阶依赖关系,或者处理教师和学生模型之间的容量差距。此外,在标准知识蒸馏中,教师接受了对学生的特征和能力的认识而接受培训。在本文中,我们探讨了两种用于使用教师和学生之间的分类器共享增强表示蒸馏的机制。我们首先调查了一个简单的方案,其中教师的分类器连接到学生骨干,并充当额外的分类头。然后,我们提出了一种学生意识的机制,该机制要求通过临时学生的头训练老师的能力来量身定制教师模型。我们分析和比较这两种机制,并在各种数据集和任务上显示它们的有效性,包括图像分类,细粒度分类和面部验证。特别是,我们在IJB-C数据集上为移动模型的IJB-C数据集上实现了最新的结果:tar@(far = 1e-5)= 93.7 \%。代码可在https://github.com/alibaba-miil/headsharingkd上找到。

Representation knowledge distillation aims at transferring rich information from one model to another. Common approaches for representation distillation mainly focus on the direct minimization of distance metrics between the models' embedding vectors. Such direct methods may be limited in transferring high-order dependencies embedded in the representation vectors, or in handling the capacity gap between the teacher and student models. Moreover, in standard knowledge distillation, the teacher is trained without awareness of the student's characteristics and capacity. In this paper, we explore two mechanisms for enhancing representation distillation using classifier sharing between the teacher and student. We first investigate a simple scheme where the teacher's classifier is connected to the student backbone, acting as an additional classification head. Then, we propose a student-aware mechanism that asks to tailor the teacher model to a student with limited capacity by training the teacher with a temporary student's head. We analyze and compare these two mechanisms and show their effectiveness on various datasets and tasks, including image classification, fine-grained classification, and face verification. In particular, we achieve state-of-the-art results for face verification on the IJB-C dataset for a MobileFaceNet model: TAR@(FAR=1e-5)=93.7\%. Code is available at https://github.com/Alibaba-MIIL/HeadSharingKD.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源