论文标题
选择性跨任务蒸馏
Selective Cross-Task Distillation
论文作者
论文摘要
各种预培训模型的涌入通过提供丰富的教师资源来赋予知识蒸馏,但缺乏发达的机制来充分利用这些教师。通过由预先培训各种任务的教师组成的大型模型存储库,我们必须在使用知识蒸馏学习新任务时克服两个障碍。首先,鉴于固定的计算预算,尝试每个老师并反复训练学生是不起作用的,因此有必要精确,有效地寻找最有贡献的教师。其次,教师和目标学生之间存在语义差距,因为他们接受了不同的任务培训。因此,我们需要从可能与学生不同的一般标签空间中提取知识。面对这两个挑战,我们研究了一个名为选择性跨任务蒸馏的新设置,其中包括教师评估和广义知识再利用。我们通过最佳运输来弥合老师的标签空间和学生的标签空间。从教师的预测到学生预测的运输成本衡量了两个任务之间的相关性,并且是蒸馏的目标。我们的方法从不同的标签空间重复了交叉任务知识,并有效地评估了教师,而无需列举模型存储库。实验证明了我们提出的方法的有效性。
The outpouring of various pre-trained models empowers knowledge distillation by providing abundant teacher resources, but there lacks a developed mechanism to utilize these teachers adequately. With a massive model repository composed of teachers pre-trained on diverse tasks, we must surmount two obstacles when using knowledge distillation to learn a new task. First, given a fixed computing budget, it is not affordable to try each teacher and train the student repeatedly, making it necessary to seek out the most contributive teacher precisely and efficiently. Second, semantic gaps exist between the teachers and the target student since they are trained on different tasks. Thus, we need to extract knowledge from a general label space that may be different from the student's. Faced with these two challenges, we study a new setting named selective cross-task distillation that includes teacher assessment and generalized knowledge reuse. We bridge the teacher's label space and the student's label space through optimal transport. The transportation cost from the teacher's prediction to the student's prediction measures the relatedness between two tasks and acts as an objective for distillation. Our method reuses cross-task knowledge from a distinct label space and efficiently assesses teachers without enumerating the model repository. Experiments demonstrate the effectiveness of our proposed method.