论文标题
抽象性摘要与预训练的序列到序列和显着性模型的结合
Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models
论文作者
论文摘要
预训练的序列到序列(SEQ-to-seq)模型已显着提高了几个语言生成任务的准确性,包括抽象性摘要。尽管通过微调这些模型可以大大提高抽象性摘要的流利性,但尚不清楚它们是否还可以识别源文本的重要部分要包含在摘要中。在这项研究中,我们通过广泛的实验研究了将源文本的重要部分与预训练的SEQ-to-Seq模型相结合的有效性。我们还提出了一个新组合模型,该模型由显着性模型组成,该模型从源文本中提取一个令牌序列和Seq-to-seq模型,该模型将序列作为附加输入文本提取。实验结果表明,大多数组合模型在CNN/DM和XSUM数据集上都超过了简单的微调SEQ-to-Seq模型,即使SEQ-TO-TO-seq模型在大型语料库中进行了预训练。此外,对于CNN/DM数据集,提出的组合模型在Rouge-L上超过了先前最佳模型的1.33点。
Pre-trained sequence-to-sequence (seq-to-seq) models have significantly improved the accuracy of several language generation tasks, including abstractive summarization. Although the fluency of abstractive summarization has been greatly improved by fine-tuning these models, it is not clear whether they can also identify the important parts of the source text to be included in the summary. In this study, we investigated the effectiveness of combining saliency models that identify the important parts of the source text with the pre-trained seq-to-seq models through extensive experiments. We also proposed a new combination model consisting of a saliency model that extracts a token sequence from a source text and a seq-to-seq model that takes the sequence as an additional input text. Experimental results showed that most of the combination models outperformed a simple fine-tuned seq-to-seq model on both the CNN/DM and XSum datasets even if the seq-to-seq model is pre-trained on large-scale corpora. Moreover, for the CNN/DM dataset, the proposed combination model exceeded the previous best-performed model by 1.33 points on ROUGE-L.