论文标题
用户评论的应用程序感知响应综合
App-Aware Response Synthesis for User Reviews
论文作者
论文摘要
迅速响应用户评论,并令人满意地提高了应用评级,这是应用程序受欢迎程度和成功的关键。此类评论的扩散使开发人员几乎不可能手动跟上响应。为了应对这一挑战,最近的工作表明了自动响应产生的可能性。但是,由于培训审查响应对是由许多不同应用程序汇总的,因此此类模型生成特定于应用程序的响应仍然具有挑战性,另一方面,由于应用具有不同的功能和疑虑,因此通常是可取的。通过简单地构建每个应用程序的模型(即使用单个应用程序的审核响应对培训)来解决挑战可能不足,因为各个应用程序的审查响应对有限,并且此类对通常缺乏响应新审查所需的相关信息。为了启用特定于应用程序的响应生成,这项工作提出了AARSYNTH:APP-感知响应综合系统。 Aarsynth背后的关键思想是使用特定于给定应用的信息来增强SEQ2SEQ模型。考虑到新的用户评论,它首先从应用程序描述中检索了最相关的应用程序评论和最相关的片段。然后将检索到的信息和新的用户审查送入融合的机器学习模型,该模型将SEQ2SEQ模型与机器阅读理解模型集成在一起。后者有助于消化检索到的评论和应用程序描述。最后,融合模型生成了针对给定应用程序自定义的响应。我们使用Google Play的大量评论和回答对Aarsynth进行了评估。结果表明,Aarsyth以BLEU-4分数优于最先进的系统22.2%。此外,我们的人类研究表明,与最先进的系统相比,AARSYTH在响应质量上产生统计学上的显着改善。
Responding to user reviews promptly and satisfactorily improves application ratings, which is key to application popularity and success. The proliferation of such reviews makes it virtually impossible for developers to keep up with responding manually. To address this challenge, recent work has shown the possibility of automatic response generation. However, because the training review-response pairs are aggregated from many different apps, it remains challenging for such models to generate app-specific responses, which, on the other hand, are often desirable as apps have different features and concerns. Solving the challenge by simply building a model per app (i.e., training with review-response pairs of a single app) may be insufficient because individual apps have limited review-response pairs, and such pairs typically lack the relevant information needed to respond to a new review. To enable app-specific response generation, this work proposes AARSynth: an app-aware response synthesis system. The key idea behind AARSynth is to augment the seq2seq model with information specific to a given app. Given a new user review, it first retrieves the top-K most relevant app reviews and the most relevant snippet from the app description. The retrieved information and the new user review are then fed into a fused machine learning model that integrates the seq2seq model with a machine reading comprehension model. The latter helps digest the retrieved reviews and app description. Finally, the fused model generates a response that is customized to the given app. We evaluated AARSynth using a large corpus of reviews and responses from Google Play. The results show that AARSynth outperforms the state-of-the-art system by 22.2% on BLEU-4 score. Furthermore, our human study shows that AARSynth produces a statistically significant improvement in response quality compared to the state-of-the-art system.