通过引文图增强了科学纸的提取摘要

论文标题

通过引文图增强了科学纸的提取摘要

Scientific Paper Extractive Summarization Enhanced by Citation Graphs

论文作者

Chen, Xiuying, Li, Mingzhe, Gao, Shen, Yan, Rui, Gao, Xin, Zhang, Xiangliang

论文摘要

在引用图中，相邻的纸节点共享相关的科学术语和主题。因此，该图传达了文档级相关性的独特结构信息，这些信息可以在纸张摘要任务中使用，以探索超越文档内部信息。在这项工作中，我们专注于利用引文图来改善不同设置下的科学纸张提取性摘要。我们首先提出了一个多粒性无监督的摘要模型（MUS），作为对任务的简单且低成本的解决方案。 MUS FINETUNES通过链接预测任务在引文图上进行了预训练的编码模型。然后，从相应的论文中提取抽象句子，考虑了多个范围信息。初步结果表明，即使在一个简单的无监督框架中，引文图也很有帮助。在此激励的情况下，我们接下来提出了一个基于图的监督摘要模型（GSS），以在可用的大规模标记数据时在任务上获得更准确的结果。除了采用链接预测作为辅助任务外，GSS还引入了一个封闭式的句子编码器和图形信息融合模块来利用图形信息来抛光句子表示。公共基准数据集的实验表明，MUS和GSS比先前的最新模型带来了重大改进。

In a citation graph, adjacent paper nodes share related scientific terms and topics. The graph thus conveys unique structure information of document-level relatedness that can be utilized in the paper summarization task, for exploring beyond the intra-document information. In this work, we focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings. We first propose a Multi-granularity Unsupervised Summarization model (MUS) as a simple and low-cost solution to the task. MUS finetunes a pre-trained encoder model on the citation graph by link prediction tasks. Then, the abstract sentences are extracted from the corresponding paper considering multi-granularity information. Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework. Motivated by this, we next propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available. Apart from employing the link prediction as an auxiliary task, GSS introduces a gated sentence encoder and a graph information fusion module to take advantage of the graph information to polish the sentence representation. Experiments on a public benchmark dataset show that MUS and GSS bring substantial improvements over the prior state-of-the-art model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题