论文标题
Narrasum:用于抽象叙事摘要的大型数据集
NarraSum: A Large-Scale Dataset for Abstractive Narrative Summarization
论文作者
论文摘要
叙事摘要旨在制作叙事的蒸馏版本来描述其最突出的事件和人物。总结叙事是具有挑战性的,因为它需要了解事件因果关系和性格行为。为了鼓励朝这个方向进行研究,我们提出了Narrasum,这是一个大规模的叙事摘要数据集。它包含122K叙事文档,这些文档是从具有多种流派的电影和电视剧集的情节描述及其相应的抽象摘要中收集的。实验表明,人类与纳拉萨姆的最新摘要模型之间存在巨大的性能差距。我们希望该数据集将在摘要中促进未来的研究,并对自然语言理解和产生进行更广泛的研究。该数据集可从https://github.com/zhaochaocs/narrasum获得。
Narrative summarization aims to produce a distilled version of a narrative to describe its most salient events and characters. Summarizing a narrative is challenging as it requires an understanding of event causality and character behaviors. To encourage research in this direction, we propose NarraSum, a large-scale narrative summarization dataset. It contains 122K narrative documents, which are collected from plot descriptions of movies and TV episodes with diverse genres, and their corresponding abstractive summaries. Experiments show that there is a large performance gap between humans and the state-of-the-art summarization models on NarraSum. We hope that this dataset will promote future research in summarization, as well as broader studies of natural language understanding and generation. The dataset is available at https://github.com/zhaochaocs/narrasum.