论文标题

Cued_speech在TREC 2020播客摘要曲目

CUED_speech at TREC 2020 Podcast Summarisation Track

论文作者

Manakul, Potsawee, Gales, Mark

论文摘要

在本文中,我们描述了TREC 2020中播客摘要挑战的方法。鉴于播客插曲的转录,目标是生成一个摘要,以捕获内容中最重要的信息。我们的方法由两个步骤组成:(1)使用层次模型的注意,在转录中过滤冗余或少信息; (2)使用序列级奖励函数应用在播客数据上微调的最先进的文本摘要系统(BART)。此外,我们为提交运行执行了三个和九个模型的合奏。我们还将播客数据上的BART模型微调为我们的基线。 NIST的人类评估表明,我们的最佳提交在EGFB量表中达到1.777,而创建者提供的描述的得分为1.291。我们的系统在人类和自动评估的TREC2020播客中赢得了Spotify播客摘要挑战。

In this paper, we describe our approach for the Podcast Summarisation challenge in TREC 2020. Given a podcast episode with its transcription, the goal is to generate a summary that captures the most important information in the content. Our approach consists of two steps: (1) Filtering redundant or less informative sentences in the transcription using the attention of a hierarchical model; (2) Applying a state-of-the-art text summarisation system (BART) fine-tuned on the Podcast data using a sequence-level reward function. Furthermore, we perform ensembles of three and nine models for our submission runs. We also fine-tune the BART model on the Podcast data as our baseline. The human evaluation by NIST shows that our best submission achieves 1.777 in the EGFB scale, while the score of creator-provided description is 1.291. Our system won the Spotify Podcast Summarisation Challenge in the TREC2020 Podcast Track in both human and automatic evaluation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源