论文标题
多任务学习的知识蒸馏
Knowledge Distillation for Multi-task Learning
论文作者
论文摘要
多任务学习(MTL)是学习一个单个模型,该模型执行多个任务,以在所有任务上实现良好的性能和较低的计算成本。学习这样的模型需要共同优化具有不同难度水平,大幅度和特征(例如跨熵,欧几里得损失)的一组任务的损失,从而导致多任务学习中存在不平衡问题。为了解决不平衡问题,我们在这项工作中提出了一种基于知识蒸馏的方法。我们首先学习每个任务的特定任务模型。然后,我们学习多任务模型,以最大程度地减少特定于任务的损失并使用特定于任务模型产生相同的功能。当特定于任务的网络编码不同的功能时,我们引入了小型任务适配器,以将多任务功能投影到特定于任务的功能。这样,适配器将特定于任务的功能和多任务功能对齐,该功能可以跨任务进行平衡的参数共享。广泛的实验结果表明,我们的方法可以以更平衡的方式优化多任务学习模型并实现更好的整体性能。
Multi-task learning (MTL) is to learn one single model that performs multiple tasks for achieving good performance on all tasks and lower cost on computation. Learning such a model requires to jointly optimize losses of a set of tasks with different difficulty levels, magnitudes, and characteristics (e.g. cross-entropy, Euclidean loss), leading to the imbalance problem in multi-task learning. To address the imbalance problem, we propose a knowledge distillation based method in this work. We first learn a task-specific model for each task. We then learn the multi-task model for minimizing task-specific loss and for producing the same feature with task-specific models. As the task-specific network encodes different features, we introduce small task-specific adaptors to project multi-task features to the task-specific features. In this way, the adaptors align the task-specific feature and the multi-task feature, which enables a balanced parameter sharing across tasks. Extensive experimental results demonstrate that our method can optimize a multi-task learning model in a more balanced way and achieve better overall performance.