论文标题
改善标题一代的真实性
Improving Truthfulness of Headline Generation
论文作者
论文摘要
大多数关于抽象性摘要报告的研究报告了系统和参考摘要之间的分数。但是,我们对生成的摘要的真实性感到关注:源文本中是否提到了生成的摘要的所有事实。本文探讨了改善两个流行数据集上标题生成的真实性。分析最新的编码器模型生成的头条新闻,我们表明该模型有时会产生不真实的头条新闻。我们猜测,用于培训模型的不真实监督数据的原因之一。为了量化文章 - 头条对的真实性,我们考虑文本是否需要其标题。在确认了数据集中的许多不正确实例之后,这项研究假设从监督数据中删除不真实的实例可能会解决模型不正确行为的问题。构建一个二进制分类器,该分类器预测文章与其标题之间的需要关系,我们从监督数据中筛选出不真实的实例。实验结果表明,经过过滤的监督数据训练的标题生成模型表明,胭脂分数没有明显的差异,但对生成的头条新闻的自动和手动评估进行了显着改善。
Most studies on abstractive summarization report ROUGE scores between system and reference summaries. However, we have a concern about the truthfulness of generated summaries: whether all facts of a generated summary are mentioned in the source text. This paper explores improving the truthfulness in headline generation on two popular datasets. Analyzing headlines generated by the state-of-the-art encoder-decoder model, we show that the model sometimes generates untruthful headlines. We conjecture that one of the reasons lies in untruthful supervision data used for training the model. In order to quantify the truthfulness of article-headline pairs, we consider the textual entailment of whether an article entails its headline. After confirming quite a few untruthful instances in the datasets, this study hypothesizes that removing untruthful instances from the supervision data may remedy the problem of the untruthful behaviors of the model. Building a binary classifier that predicts an entailment relation between an article and its headline, we filter out untruthful instances from the supervision data. Experimental results demonstrate that the headline generation model trained on filtered supervision data shows no clear difference in ROUGE scores but remarkable improvements in automatic and manual evaluations of the generated headlines.