审计语言模型的持续域调节

论文标题

审计语言模型的持续域调节

Continual Domain-Tuning for Pretrained Language Models

论文作者

Rongali, Subendhu, Jagannatha, Abhyuday, Rawat, Bhanu Pratap Singh, Yu, Hong

论文摘要

可以通过在新的目标域语料库上继续前训练阶段来调整预训练的语言模型（LM），例如BERT，DISTILBERT和ROBERTA，可以针对不同的域（域调节）调整。这种简单的域调整（SDT）技术已被广泛用于创建域名模型，例如Biobert，Scibert和Clinicalbert。但是，在目标域的训练阶段，LM模型可能会灾难性地忘记从其源域中学到的模式。在这项工作中，我们研究了灾难性遗忘对域调节的LM模型的影响，并研究了减轻其负面影响的方法。我们建议基于SDT的基于持续学习（CL）的替代方案，旨在减少灾难性遗忘。我们表明，这些方法可能会增加下游目标域任务上LM模型的性能。此外，我们还表明，限制LM模型忘记源域导致下游任务模型对域移动更强大。我们分析了使用建议的CL方法的计算成本，并为计算轻巧有效的CL域调节程序提供了建议。

Pre-trained language models (LM) such as BERT, DistilBERT, and RoBERTa can be tuned for different domains (domain-tuning) by continuing the pre-training phase on a new target domain corpus. This simple domain tuning (SDT) technique has been widely used to create domain-tuned models such as BioBERT, SciBERT and ClinicalBERT. However, during the pretraining phase on the target domain, the LM models may catastrophically forget the patterns learned from their source domain. In this work, we study the effects of catastrophic forgetting on domain-tuned LM models and investigate methods that mitigate its negative effects. We propose continual learning (CL) based alternatives for SDT, that aim to reduce catastrophic forgetting. We show that these methods may increase the performance of LM models on downstream target domain tasks. Additionally, we also show that constraining the LM model from forgetting the source domain leads to downstream task models that are more robust to domain shifts. We analyze the computational cost of using our proposed CL methods and provide recommendations for computationally lightweight and effective CL domain-tuning procedures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题