论文标题
主题检测和摘要用户评论
Topic Detection and Summarization of User Reviews
论文作者
论文摘要
每天从各个平台生成大量评论。人们不可能阅读大量的评论并获得有用的信息。因此,自动汇总客户评论对于识别和提取基本信息以帮助用户获得数据要点很重要。但是,由于客户评论通常是简短的,非正式的和多方面的,因此通过主题摘要进行主题摘要非常具有挑战性。虽然有几项研究的目的是解决此问题,但它们是仅利用客户评论而开发的启发式方法。与现有方法不同,我们通过分析评论和摘要提出了一种有效的新摘要方法。要做到这一点,我们将评论和摘要首先分为单个情感。由于情感通常很短,因此我们将谈论相同方面的情感结合到一个文档中,并应用主题建模方法,以在客户评论和摘要之间识别隐藏的主题。情感分析用于区分每个检测到的主题之间的积极和负面意见。还引入了分类器,以区分摘要的写作模式和客户评论的写作模式。最后,选择情感以根据其主题相关性,情感分析评分和写作模式生成摘要。为了测试我们的方法,一个新的数据集,其中包括产品评论,并从亚马逊和CNET收集了约1028个产品。实验结果表明,与其他方法相比,我们方法的有效性。
A massive amount of reviews are generated daily from various platforms. It is impossible for people to read through tons of reviews and to obtain useful information. Automatic summarizing customer reviews thus is important for identifying and extracting the essential information to help users to obtain the gist of the data. However, as customer reviews are typically short, informal, and multifaceted, it is extremely challenging to generate topic-wise summarization.While there are several studies aims to solve this issue, they are heuristic methods that are developed only utilizing customer reviews. Unlike existing method, we propose an effective new summarization method by analyzing both reviews and summaries.To do that, we first segment reviews and summaries into individual sentiments. As the sentiments are typically short, we combine sentiments talking about the same aspect into a single document and apply topic modeling method to identify hidden topics among customer reviews and summaries. Sentiment analysis is employed to distinguish positive and negative opinions among each detected topic. A classifier is also introduced to distinguish the writing pattern of summaries and that of customer reviews. Finally, sentiments are selected to generate the summarization based on their topic relevance, sentiment analysis score and the writing pattern. To test our method, a new dataset comprising product reviews and summaries about 1028 products are collected from Amazon and CNET. Experimental results show the effectiveness of our method compared with other methods.