论文标题

通过主题感知图形神经网络增强提取文本摘要

Enhancing Extractive Text Summarization with Topic-Aware Graph Neural Networks

论文作者

Cui, Peng, Hu, Le, Liu, Yuanchao

论文摘要

文本摘要旨在将文本文档压缩为简短的摘要,同时保留显着信息。由于其流利性和效率,提取方法被广泛用于文本摘要中。但是,大多数现有的提取模型几乎无法捕获句子间的关系,尤其是在长文档中。他们通常还忽略了局部信息对捕获重要内容的影响。为了解决这些问题,本文提出了一个基于图形神经网络(GNN)的提取性摘要模型,从而通过图形结构的文档表示有效地捕获了句子间关系。此外,我们的模型集成了一个联合神经主题模型(NTM)以发现潜在主题,该主题可以为句子选择提供文档级别的功能。实验结果表明,我们的模型不仅在CNN/DM和NYT数据集上实现了最先进的结果,而且在科学论文数据集上的现有方法的表现相当大,该数据集由更长的文档组成,表明其在文档类型和长度上的稳健性更好。进一步的讨论表明,局部信息可以帮助模型预选整个文档中的显着内容,从而在长文档摘要中解释了其有效性。

Text summarization aims to compress a textual document to a short summary while keeping salient information. Extractive approaches are widely used in text summarization because of their fluency and efficiency. However, most of existing extractive models hardly capture inter-sentence relationships, particularly in long documents. They also often ignore the effect of topical information on capturing important contents. To address these issues, this paper proposes a graph neural network (GNN)-based extractive summarization model, enabling to capture inter-sentence relationships efficiently via graph-structured document representation. Moreover, our model integrates a joint neural topic model (NTM) to discover latent topics, which can provide document-level features for sentence selection. The experimental results demonstrate that our model not only substantially achieves state-of-the-art results on CNN/DM and NYT datasets but also considerably outperforms existing approaches on scientific paper datasets consisting of much longer documents, indicating its better robustness in document genres and lengths. Further discussions show that topical information can help the model preselect salient contents from an entire document, which interprets its effectiveness in long document summarization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源