知识注入解码

论文标题

知识注入解码

Knowledge Infused Decoding

论文作者

Liu, Ruibo, Zheng, Guoqing, Gupta, Shashank, Gaonkar, Radhika, Gao, Chongyang, Vosoughi, Soroush, Shokouhi, Milad, Awadallah, Ahmed Hassan

论文摘要

预先训练的语言模型（LMS）已被证明，可以记住培训前语料库的大量知识；但是，在某种情况下，它们仍然限制在召回事实正确的知识方面。因此，当用于知识密集型自然语言（NLG）任务时，它们往往会遭受反事实或幻觉产生的困扰。该问题的最新补救措施集中于修改培训或任务微调目标以纳入知识，这些目标通常需要对实用应用的LMS进行其他额外的昂贵培训或架构修改。我们提出知识注入解码（KID） - 一种用于生成LMS的新型解码算法，该算法将外部知识动态地注入LM解码的每个步骤中。具体来说，我们根据当前上下文维护本地知识记忆，与动态创建的外部知识Trie进行交互，并不断更新本地记忆，作为知识吸引的约束，以通过强化学习指导解码。在六种不同的知识密集型NLG任务中，任务不合时宜的LMS（例如，GPT-2和BART）都武装了KID的表现优于许多任务优化的最先进的模型，并且在七个相关的知识信息输入技术的几个方案中表现出尤其强劲的表现。与多个基线相比，人类评估证实了Kid为输入环境生成更相关和事实语言的能力。最后，KID还减轻了暴露偏见，并在产生更长的序列时提供稳定的发电质量。可以在https://github.com/microsoft/kid上找到孩子的代码。

Pre-trained language models (LMs) have been shown to memorize a substantial amount of knowledge from the pre-training corpora; however, they are still limited in recalling factually correct knowledge given a certain context. Hence, they tend to suffer from counterfactual or hallucinatory generation when used in knowledge-intensive natural language generation (NLG) tasks. Recent remedies to this problem focus on modifying either the pre-training or task fine-tuning objectives to incorporate knowledge, which normally require additional costly training or architecture modification of LMs for practical applications. We present Knowledge Infused Decoding (KID) -- a novel decoding algorithm for generative LMs, which dynamically infuses external knowledge into each step of the LM decoding. Specifically, we maintain a local knowledge memory based on the current context, interacting with a dynamically created external knowledge trie, and continuously update the local memory as a knowledge-aware constraint to guide decoding via reinforcement learning. On six diverse knowledge-intensive NLG tasks, task-agnostic LMs (e.g., GPT-2 and BART) armed with KID outperform many task-optimized state-of-the-art models, and show particularly strong performance in few-shot scenarios over seven related knowledge-infusion techniques. Human evaluation confirms KID's ability to generate more relevant and factual language for the input context when compared with multiple baselines. Finally, KID also alleviates exposure bias and provides stable generation quality when generating longer sequences. Code for KID is available at https://github.com/microsoft/KID.

下载PDF全文

下载文献需遵守相关版权规定

论文标题