论文标题
回忆和学习:微调深度审计的语言模型,而遗忘却不那么忘记
Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting
论文作者
论文摘要
经过深入的语言模型在先审计的方式和微调方面取得了巨大的成功。但是,这样的顺序转移学习范式通常会面临灾难性的遗忘问题,并导致次优的性能。为了减少遗忘,我们提出了一种召回和学习机制,该机制采用了多任务学习的想法,并共同学习了训练前任务和下游任务。具体而言,我们提出了一种预处理的仿真机制,以回忆起没有数据的预读任务的知识,以及一个客观的转移机制,以逐渐将学习集中在下游任务上。实验表明,我们的方法在胶水基准上实现了最先进的性能。我们的方法还使BERT基本比直接对Bert-Large进行微调。此外,我们提供开源的RecadAM优化器,该优化器将提出的机制集成到Adam Optimizer中,以设施NLP社区。
Deep pretrained language models have achieved great success in the way of pretraining first and then fine-tuning. But such a sequential transfer learning paradigm often confronts the catastrophic forgetting problem and leads to sub-optimal performance. To fine-tune with less forgetting, we propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks. Specifically, we propose a Pretraining Simulation mechanism to recall the knowledge from pretraining tasks without data, and an Objective Shifting mechanism to focus the learning on downstream tasks gradually. Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark. Our method also enables BERT-base to achieve better performance than directly fine-tuning of BERT-large. Further, we provide the open-source RecAdam optimizer, which integrates the proposed mechanisms into Adam optimizer, to facility the NLP community.