论文标题

Aredsum:提取文档摘要的自适应冗余迭代句子排名

AREDSUM: Adaptive Redundancy-Aware Iterative Sentence Ranking for Extractive Document Summarization

论文作者

Bi, Keping, Jha, Rahul, Croft, W. Bruce, Celikyilmaz, Asli

论文摘要

冗余意识到的提取性摘要系统得分的句子的冗余,要么与他们的显着信息共同包含在摘要中,要么是作为额外的句子评分步骤。先前的工作显示了具有神经序列产生模型的共同评分和选择句子的功效。但是,如果增益是由于更好的编码技术或更好的冗余方法,那并不是很好地理解。同样,显着性与多样性组成部分对创建摘要的贡献也不得到很好的研究。我们以最新的编码方法进行了摘要,我们提出了两个自适应学习模型:Aredsum-Seq,它们在句子选择过程中共同考虑显着性和新颖性;首先要获得显着性,然后学会平衡显着性和冗余,从而使每个方面的影响得以衡量。 CNN/DailyMail和NYT50数据集的经验结果表明,通过在单独的步骤中明确对多样性进行建模,AREDSUM-CTX的性能明显优于AREDSUM-SEQ以及最先进的提取性摘要碱基。

Redundancy-aware extractive summarization systems score the redundancy of the sentences to be included in a summary either jointly with their salience information or separately as an additional sentence scoring step. Previous work shows the efficacy of jointly scoring and selecting sentences with neural sequence generation models. It is, however, not well-understood if the gain is due to better encoding techniques or better redundancy reduction approaches. Similarly, the contribution of salience versus diversity components on the created summary is not studied well. Building on the state-of-the-art encoding methods for summarization, we present two adaptive learning models: AREDSUM-SEQ that jointly considers salience and novelty during sentence selection; and a two-step AREDSUM-CTX that scores salience first, then learns to balance salience and redundancy, enabling the measurement of the impact of each aspect. Empirical results on CNN/DailyMail and NYT50 datasets show that by modeling diversity explicitly in a separate step, AREDSUM-CTX achieves significantly better performance than AREDSUM-SEQ as well as state-of-the-art extractive summarization baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源