学习用您喜欢的贴纸做出响应：在多扭转对话框中统一多模式和用户偏好的框架

论文标题

学习用您喜欢的贴纸做出响应：在多扭转对话框中统一多模式和用户偏好的框架

Learning to Respond with Your Favorite Stickers: A Framework of Unifying Multi-Modality and User Preference in Multi-Turn Dialog

论文作者

Gao, Shen, Chen, Xiuying, Liu, Li, Zhao, Dongyan, Yan, Rui

论文摘要

具有生动和引人入胜的表达式的贴纸在在线消息传递应用中越来越流行，并且有些作品通过将贴纸图像与以前的话语匹配，从而自动选择贴纸响应。但是，现有方法通常着重于测量对话框上下文和贴纸图像之间的匹配度，这忽略了使用贴纸的用户偏好。因此，在本文中，我们建议使用用户的历史记录，根据多转话对话框上下文和贴纸为用户提供适当的贴纸。这项任务面临两个主要挑战。一种是根据先前的贴纸选择历史记录对用户的贴纸偏好进行建模。另一个挑战是将用户偏好以及对话框上下文与候选贴纸之间的匹配融合到最终预测制作中。为了应对这些挑战，我们提出了一个\ emph {preference增强贴纸响应选择器}（PESRS）模型。具体来说，PESR首先采用基于卷积的贴纸图像编码器和基于自我注意的多转话对话框编码器来获得贴纸和话语的表示。接下来，提出了深层互动网络，以进行贴纸和每种话语之间的深度匹配。然后，我们通过使用最近选择的贴纸作为输入来对用户偏好进行建模，并使用密钥值存储网络存储首选项表示。然后，PESR通过融合网络学习所有交互结果之间的短期和长期依赖性，并将用户偏好表示形式动态融合到最终贴纸选择预测中。在大型现实世界对话框数据集上进行的广泛实验表明，我们的模型可实现所有常用指标的最新性能。实验还验证了PESR的每个组件的有效性。

Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps, and some works are dedicated to automatically select sticker response by matching the stickers image with previous utterances. However, existing methods usually focus on measuring the matching degree between the dialog context and sticker image, which ignores the user preference of using stickers. Hence, in this paper, we propose to recommend an appropriate sticker to user based on multi-turn dialog context and sticker using history of user. Two main challenges are confronted in this task. One is to model the sticker preference of user based on the previous sticker selection history. Another challenge is to jointly fuse the user preference and the matching between dialog context and candidate sticker into final prediction making. To tackle these challenges, we propose a \emph{Preference Enhanced Sticker Response Selector} (PESRS) model. Specifically, PESRS first employs a convolutional based sticker image encoder and a self-attention based multi-turn dialog encoder to obtain the representation of stickers and utterances. Next, deep interaction network is proposed to conduct deep matching between the sticker and each utterance. Then, we model the user preference by using the recently selected stickers as input, and use a key-value memory network to store the preference representation. PESRS then learns the short-term and long-term dependency between all interaction results by a fusion network, and dynamically fuse the user preference representation into the final sticker selection prediction. Extensive experiments conducted on a large-scale real-world dialog dataset show that our model achieves the state-of-the-art performance for all commonly-used metrics. Experiments also verify the effectiveness of each component of PESRS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题