CES-KD：基于课程的专家选择用于指导知识蒸馏

论文标题

CES-KD：基于课程的专家选择用于指导知识蒸馏

CES-KD: Curriculum-based Expert Selection for Guided Knowledge Distillation

论文作者

Amara, Ibtihel, Ziaeefard, Maryam, Meyer, Brett H., Gross, Warren, Clark, James J.

论文摘要

知识蒸馏（KD）是压缩边缘设备深层分类模型的有效工具。但是，KD的表现受教师和学生网络之间较大容量差距的影响。最近的方法已诉诸KD的多个教师助理（TA）设置，该设置依次降低了教师模型的大小，以相对弥合这些模型之间的尺寸差距。本文提出了一种称为知识蒸馏（CES-KD）课程专家选择的新技术，以有效增强在容量差距问题下对紧凑型学生的学习。该技术基于以下假设：学生网络应使用分层的教学课程逐渐引导，因为它可以从较低（较高）容量的教师网络中更好地学习（硬）数据样本。具体而言，我们的方法是一种基于逐步的KD技术，它根据由难以分类图像进行分类的课程来选择一个教师每个输入图像。在这项工作中，我们凭经验验证了我们的假设，并对CIFAR-10，CIFAR-100，CINIC-10和Imagenet数据集进行了严格的实验，并在类似VGG的模型，Resnets和WideresNets架构上显示出提高的准确性。

Knowledge distillation (KD) is an effective tool for compressing deep classification models for edge devices. However, the performance of KD is affected by the large capacity gap between the teacher and student networks. Recent methods have resorted to a multiple teacher assistant (TA) setting for KD, which sequentially decreases the size of the teacher model to relatively bridge the size gap between these models. This paper proposes a new technique called Curriculum Expert Selection for Knowledge Distillation (CES-KD) to efficiently enhance the learning of a compact student under the capacity gap problem. This technique is built upon the hypothesis that a student network should be guided gradually using stratified teaching curriculum as it learns easy (hard) data samples better and faster from a lower (higher) capacity teacher network. Specifically, our method is a gradual TA-based KD technique that selects a single teacher per input image based on a curriculum driven by the difficulty in classifying the image. In this work, we empirically verify our hypothesis and rigorously experiment with CIFAR-10, CIFAR-100, CINIC-10, and ImageNet datasets and show improved accuracy on VGG-like models, ResNets, and WideResNets architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题