语义网络完成的多任务预训练语言模型

论文标题

语义网络完成的多任务预训练语言模型

Multi-task Pre-training Language Model for Semantic Network Completion

论文作者

Li, Da, Yang, Sen, Xu, Kele, Yi, Ming, He, Yukai, Wang, Huaimin

论文摘要

语义网络（例如知识图）可以代表利用图形结构的知识。尽管知识图显示了自然语言处理中有希望的价值，但它遭受了不完整的影响。本文通过预测实体之间的联系来重点介绍知识图的完成，这是一项基本但至关重要的任务。语义匹配是一种潜在的解决方案，因为它可以处理看不见的实体，而基于转化距离的方法则与之抗争。但是，为了实现基于转换距离的方法的竞争性能，基于语义匹配的方法需要大规模数据集以实现培训目的，而训练目的通常在实际设置中不可用。因此，我们采用语言模型并介绍了名为LP-Bert的新颖知识图架构，其中包含两个主要阶段：多任务预训练和知识图微调。在训练前阶段，通过预测实体或关系，采取了三个任务来推动模型以从三元组中学习关系。在通过对比度学习的启发下，在微调阶段，我们在批处理中设计了三型负抽样，这大大增加了负抽样的比例，同时使训练时间几乎没有变化。此外，我们提出了一种新的数据增强方法，利用三元组的逆关系来提高模型的性能和鲁棒性。为了证明我们方法的有效性，我们对三个广泛使用的数据集进行了广泛的实验，即WN18RR，FB15K-237和UMLS。实验结果证明了我们的方法的优势，我们的方法在WN18RR和FB15K-237数据集上实现了最先进的结果。值得注意的是，HITS@10指标比以前的WN18RR数据集的先前最新结果提高了5％，同时在UMLS数据集上达到100％。

Semantic networks, such as the knowledge graph, can represent the knowledge leveraging the graph structure. Although the knowledge graph shows promising values in natural language processing, it suffers from incompleteness. This paper focuses on knowledge graph completion by predicting linkage between entities, which is a fundamental yet critical task. Semantic matching is a potential solution as it can deal with unseen entities, which the translational distance based methods struggle with. However, to achieve competitive performance as translational distance based methods, semantic matching based methods require large-scale datasets for the training purpose, which are typically unavailable in practical settings. Therefore, we employ the language model and introduce a novel knowledge graph architecture named LP-BERT, which contains two main stages: multi-task pre-training and knowledge graph fine-tuning. In the pre-training phase, three tasks are taken to drive the model to learn the relationship from triples by predicting either entities or relations. While in the fine-tuning phase, inspired by contrastive learning, we design a triple-style negative sampling in a batch, which greatly increases the proportion of negative sampling while keeping the training time almost unchanged. Furthermore, we propose a new data augmentation method utilizing the inverse relationship of triples to improve the performance and robustness of the model. To demonstrate the effectiveness of our method, we conduct extensive experiments on three widely-used datasets, WN18RR, FB15k-237, and UMLS. The experimental results demonstrate the superiority of our methods, and our approach achieves state-of-the-art results on WN18RR and FB15k-237 datasets. Significantly, Hits@10 indicator is improved by 5% from previous state-of-the-art result on the WN18RR dataset while reaching 100% on the UMLS dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题