论文标题

结构:通过结构化表示

StructSum: Summarization via Structured Representations

论文作者

Balachandran, Vidhisha, Pagnoni, Artidoro, Lee, Jay Yoon, Rajagopal, Dheeraj, Carbonell, Jaime, Tsvetkov, Yulia

论文摘要

抽象性文本摘要旨在将长源文档的信息压缩为改写的,简洁的摘要。尽管取得了建模技术的进步,但抽象性摘要模型仍然面临着几个关键挑战:(i)布局偏见:它们过于适应培训语料库的风格; (ii)有限的抽象性:它们被优化为从源中复制n-gram,而不是生成新颖的抽象摘要; (iii)缺乏透明度:它们是不可解释的。在这项工作中,我们提出了一个基于文档级结构归纳的框架,以解决这些挑战。为此,我们建议将源文档中的句子中的潜在和明确依赖性纳入端到端的单案摘要模型。我们的框架通过基于隐式学习(潜在的)结构和外部衍生的语言(显式)结构来增强标准编码器摘要模型。我们表明,在CNN/DM数据集上接受培训的汇总框架,改善了源文档中内容的覆盖范围,通过生成更多新颖的N-grams来生成更抽象的摘要,并在与标准基地的同时进行同时执行可解释的句子级结构。

Abstractive text summarization aims at compressing the information of a long source document into a rephrased, condensed summary. Despite advances in modeling techniques, abstractive summarization models still suffer from several key challenges: (i) layout bias: they overfit to the style of training corpora; (ii) limited abstractiveness: they are optimized to copying n-grams from the source rather than generating novel abstractive summaries; (iii) lack of transparency: they are not interpretable. In this work, we propose a framework based on document-level structure induction for summarization to address these challenges. To this end, we propose incorporating latent and explicit dependencies across sentences in the source document into end-to-end single-document summarization models. Our framework complements standard encoder-decoder summarization models by augmenting them with rich structure-aware document representations based on implicitly learned (latent) structures and externally-derived linguistic (explicit) structures. We show that our summarization framework, trained on the CNN/DM dataset, improves the coverage of content in the source documents, generates more abstractive summaries by generating more novel n-grams, and incorporates interpretable sentence-level structures, while performing on par with standard baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源