论文标题
播客抽象摘要的基线分析
A Baseline Analysis for Podcast Abstractive Summarization
论文作者
论文摘要
播客摘要是影响最终用户聆听决策的重要因素,通常被认为是播客推荐系统以及许多下游应用程序中的关键功能。现有的抽象摘要方法主要建立在CNN和Dailymail News等专业编辑文本上的微调模型上。与新闻不同,播客通常更长,更容易说话和对话,并且对广告和赞助的内容吵了一下,这使自动播客摘要极具挑战性。本文使用TREC 2020提供的Spotify Podcast数据集进行了对播客摘要的基线分析。它旨在帮助研究人员了解当前最新的预培训模型,从而为创建更好的模型建立基础。
Podcast summary, an important factor affecting end-users' listening decisions, has often been considered a critical feature in podcast recommendation systems, as well as many downstream applications. Existing abstractive summarization approaches are mainly built on fine-tuned models on professionally edited texts such as CNN and DailyMail news. Different from news, podcasts are often longer, more colloquial and conversational, and noisier with contents on commercials and sponsorship, which makes automatic podcast summarization extremely challenging. This paper presents a baseline analysis of podcast summarization using the Spotify Podcast Dataset provided by TREC 2020. It aims to help researchers understand current state-of-the-art pre-trained models and hence build a foundation for creating better models.