论文标题
改善学术知识表示:评估基于BERT的科学关系分类模型
Improving Scholarly Knowledge Representation: Evaluating BERT-based Models for Scientific Relation Classification
论文作者
论文摘要
随着研究出版物的快速增长,需要在数字图书馆中组织大量的学术知识。为了应对这一挑战,正在提倡依靠知识图形结构的技术。在这样的基于图的管道中,相关科学概念之间的推断关系类型是关键步骤。最近,依靠在大型语料库中预先训练的语言模型的高级技术已被广泛探讨用于自动关系分类。尽管已经做出了显着的贡献,但其中许多方法在不同的情况下进行了评估,这限制了它们的可比性。为此,我们通过关注两个关键因素(1)BERT模型变体和2)分类策略,对八个基于BERT的分类模型进行了彻底的经验评估。三个语料库的实验表明,特定领域的预训练语料库受益于基于BERT的分类模型,以识别科学关系的类型。尽管每次达到单一关系的策略都比同时识别多个关系类型的策略通常都具有更高的分类精度,但后一种策略在语料库中表现出具有更大或较小的注释的语料库的表现更加一致。我们的研究旨在向数字图书馆的利益相关者提供建议,以选择适当的技术来构建基于知识图的系统,以增强学术信息组织。
With the rapid growth of research publications, there is a vast amount of scholarly knowledge that needs to be organized in digital libraries. To deal with this challenge, techniques relying on knowledge-graph structures are being advocated. Within such graph-based pipelines, inferring relation types between related scientific concepts is a crucial step. Recently, advanced techniques relying on language models pre-trained on the large corpus have been popularly explored for automatic relation classification. Despite remarkable contributions that have been made, many of these methods were evaluated under different scenarios, which limits their comparability. To this end, we present a thorough empirical evaluation on eight Bert-based classification models by focusing on two key factors: 1) Bert model variants, and 2) classification strategies. Experiments on three corpora show that domain-specific pre-training corpus benefits the Bert-based classification model to identify the type of scientific relations. Although the strategy of predicting a single relation each time achieves a higher classification accuracy than the strategy of identifying multiple relation types simultaneously in general, the latter strategy demonstrates a more consistent performance in the corpus with either a large or small size of annotations. Our study aims to offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build knowledge-graph-based systems for enhanced scholarly information organization.