论文标题
通过目标意识表示脱节,朝着准确的知识转移
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement
论文作者
论文摘要
在大规模数据集中预先训练的微调深度神经网络是鉴于有限数量的培训样本,是最实用的转移学习范式之一。为了获得更好的概括,使用重量或特征使用起点作为参考(SPAR),已成功地应用于正规器的转移学习。但是,由于源和目标任务之间的领域差异,存在直接保留知识方式的负转移的明显风险。在本文中,我们提出了一种新颖的转移学习算法,介绍了目标意识表示解散(TRED)的概念,其中相关的有关目标任务的相关知识与原始源模型分离,并在对目标模型进行微调过程中用作正规机。具体而言,我们设计了两种替代方法,以最大化最大平均差异(MAX-MMD),并最大程度地减少互信息(Min-MI),以使表示分解。各种现实世界数据集的实验表明,我们的方法稳定地将标准微调提高了2%以上。 TRED还优于相关的最先进的转移学习正规仪,例如L2-SP,AT,DELTA和BSS。
Fine-tuning deep neural networks pre-trained on large scale datasets is one of the most practical transfer learning paradigm given limited quantity of training samples. To obtain better generalization, using the starting point as the reference (SPAR), either through weights or features, has been successfully applied to transfer learning as a regularizer. However, due to the domain discrepancy between the source and target task, there exists obvious risk of negative transfer in a straightforward manner of knowledge preserving. In this paper, we propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED), where the relevant knowledge with respect to the target task is disentangled from the original source model and used as a regularizer during fine-tuning the target model. Specifically, we design two alternative methods, maximizing the Maximum Mean Discrepancy (Max-MMD) and minimizing the mutual information (Min-MI), for the representation disentanglement. Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average. TRED also outperforms related state-of-the-art transfer learning regularizers such as L2-SP, AT, DELTA, and BSS.