论文标题
卡拉:知识增强语言模型的适应
KALA: Knowledge-Augmented Language Model Adaptation
论文作者
论文摘要
预训练的语言模型(PLM)在各种自然语言理解任务上取得了巨大的成功。另一方面,对PLM的简单微调对于特定于域的任务可能是次优的,因为它们不能涵盖所有域中的知识。尽管PLM的自适应预培训可以帮助他们获得特定于领域的知识,但需要大量的培训成本。此外,自适应预训练可能会通过造成灾难性忘记其常识来损害PLM在下游任务上的表现。为了克服对PLM自适应的适应性预训练的这种局限性,我们提出了一个新颖的域自适应框架,用于将PLMS创造为知识增强语言模型适应性(KALA),该框架调节了PLM的中间隐藏表示与领域知识的中间表示,由实体及其关系事实组成。我们验证了Kala在问题回答上的性能,并在各个域的多个数据集上命名实体识别任务。结果表明,尽管在计算上有效,但我们的Kala在很大程度上要优于适应性预训练。代码可在以下网址获得:https://github.com/nardien/kala/。
Pre-trained language models (PLMs) have achieved remarkable success on various natural language understanding tasks. Simple fine-tuning of PLMs, on the other hand, might be suboptimal for domain-specific tasks because they cannot possibly cover knowledge from all domains. While adaptive pre-training of PLMs can help them obtain domain-specific knowledge, it requires a large training cost. Moreover, adaptive pre-training can harm the PLM's performance on the downstream task by causing catastrophic forgetting of its general knowledge. To overcome such limitations of adaptive pre-training for PLM adaption, we propose a novel domain adaption framework for PLMs coined as Knowledge-Augmented Language model Adaptation (KALA), which modulates the intermediate hidden representations of PLMs with domain knowledge, consisting of entities and their relational facts. We validate the performance of our KALA on question answering and named entity recognition tasks on multiple datasets across various domains. The results show that, despite being computationally efficient, our KALA largely outperforms adaptive pre-training. Code is available at: https://github.com/Nardien/KALA/.