关于抽象性摘要中的忠诚和事实

论文标题

关于抽象性摘要中的忠诚和事实

On Faithfulness and Factuality in Abstractive Summarization

论文作者

Maynez, Joshua, Narayan, Shashi, Bohnet, Bernd, McDonald, Ryan

论文摘要

众所周知，在神经文本生成模型中，标准的可能性训练和近似解码目标导致对开放式任务（例如语言建模和故事产生）的人类般的响应较少。在本文中，我们分析了这些模型的抽象文档摘要的局限性，并发现这些模型非常容易幻觉，这些内容对输入文档不忠。我们对几种神经抽象摘要系统进行了大规模的人类评估，以更好地了解它们产生的幻觉类型。我们的人类注释者在所有模型产生的摘要中发现了大量的幻觉内容。但是，我们的分析确实表明，预验证的模型不仅是原始指标（即胭脂），而且在产生人类评估的忠实和事实摘要方面也是如此。此外，我们表明，与标准指标相比，文本需要衡量与忠诚相关的措施，这可能会引起自动评估指标以及培训和解码标准的道路。

It is well known that the standard likelihood training and approximate decoding objectives in neural text generation models lead to less human-like responses for open-ended tasks such as language modeling and story generation. In this paper we have analyzed limitations of these models for abstractive document summarization and found that these models are highly prone to hallucinate content that is unfaithful to the input document. We conducted a large scale human evaluation of several neural abstractive summarization systems to better understand the types of hallucinations they produce. Our human annotators found substantial amounts of hallucinated content in all model generated summaries. However, our analysis does show that pretrained models are better summarizers not only in terms of raw metrics, i.e., ROUGE, but also in generating faithful and factual summaries as evaluated by humans. Furthermore, we show that textual entailment measures better correlate with faithfulness than standard metrics, potentially leading the way to automatic evaluation metrics as well as training and decoding criteria.

下载PDF全文

下载文献需遵守相关版权规定

论文标题