构建自然语言处理贡献的知识图 - 试验数据集的句子，短语和三重注释

论文标题

构建自然语言处理贡献的知识图 - 试验数据集的句子，短语和三重注释

Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions -- A Trial Dataset

论文作者

D'Souza, Jennifer, Auer, Sören

论文摘要

目的：这项工作的目的是将NLPContributions方案（从此以后，NLPContributionGraph）正常化，直接从文章句子中，通过两阶段注释方法学：1）试验阶段来定义该方案（在先前的工作中描述）； 2）裁决阶段 - 使图形模型（本文的重点）归一化。设计/方法/方法：我们第二次重新注释，这是50个先前注销的NLP学术文章中的贡献信息，这些信息包括：以贡献为中心的句子，短语和三重陈述。为此，具体来说，在裁决注释阶段进行了谨慎，以减少注释噪声，同时为我们提出的新型NLP贡献结构和图形方案制定指南。调查结果：NLPContributionGraph在50篇文章上的应用最终导致了900个以贡献为中心的句子，4,702个以贡献信息为中心的句子，以及2,980个表面结构的三倍。就句子而言，第一阶段和第二阶段之间的通道内一致性为67.92％，短语为41.82％，三重陈述表示22.31％，表明随着信息粒度的增加，注释决策差异更大。实际含义：我们演示了NLPContributionGraph数据集成到开放研究知识图（ORKG）中，这是一个基于智能计算的下一代基于KG的数字图书馆，具有智能计算，以实现结构化的学术知识，作为可行的帮助，以帮助研究人员完成日常任务。

Purpose: The aim of this work is to normalize the NLPCONTRIBUTIONS scheme (henceforward, NLPCONTRIBUTIONGRAPH) to structure, directly from article sentences, the contributions information in Natural Language Processing (NLP) scholarly articles via a two-stage annotation methodology: 1) pilot stage - to define the scheme (described in prior work); and 2) adjudication stage - to normalize the graphing model (the focus of this paper). Design/methodology/approach: We re-annotate, a second time, the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising: contribution-centered sentences, phrases, and triple statements. To this end, specifically, care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme. Findings: The application of NLPCONTRIBUTIONGRAPH on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences, 4,702 contribution-information-centered phrases, and 2,980 surface-structured triples. The intra-annotation agreement between the first and second stages, in terms of F1, was 67.92% for sentences, 41.82% for phrases, and 22.31% for triple statements indicating that with increased granularity of the information, the annotation decision variance is greater. Practical Implications: We demonstrate NLPCONTRIBUTIONGRAPH data integrated into the Open Research Knowledge Graph (ORKG), a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge, as a viable aid to assist researchers in their day-to-day tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题