论文标题
用于促进特定领域的NLP任务的统一知识增强服务
A Unified Knowledge Graph Augmentation Service for Boosting Domain-specific NLP Tasks
论文作者
论文摘要
通过将预培训过程集中在特定领域的语料库上,一些特定领域的预训练的语言模型(PLM)已取得了最新的结果。但是,设计统一范式以在PLM微调阶段注入域知识,因此被评估不足。我们建议使用统一的域语言模型开发服务Knowleda,以使用域知识图来增强特定于任务的培训程序。给定特定领域的任务文本输入,Knowledgeda可以按照三个步骤自动生成特定于域的语言模型:(i)通过嵌入相似的方法在文本中本地化域知识实体; (ii)通过从知识图和培训数据的两个视图中检索可更换域实体对来生成增强样品; (iii)选择高质量的增强样品通过基于置信的评估进行微调。我们实施了知识的原型,以学习两个领域,医疗保健和软件开发的语言模型。针对特定领域的文本分类和质量检查任务的实验验证了知识的有效性和概括性。
By focusing the pre-training process on domain-specific corpora, some domain-specific pre-trained language models (PLMs) have achieved state-of-the-art results. However, it is under-investigated to design a unified paradigm to inject domain knowledge in the PLM fine-tuning stage. We propose KnowledgeDA, a unified domain language model development service to enhance the task-specific training procedure with domain knowledge graphs. Given domain-specific task texts input, KnowledgeDA can automatically generate a domain-specific language model following three steps: (i) localize domain knowledge entities in texts via an embedding-similarity approach; (ii) generate augmented samples by retrieving replaceable domain entity pairs from two views of both knowledge graph and training data; (iii) select high-quality augmented samples for fine-tuning via confidence-based assessment. We implement a prototype of KnowledgeDA to learn language models for two domains, healthcare and software development. Experiments on domain-specific text classification and QA tasks verify the effectiveness and generalizability of KnowledgeDA.