在开放域对话搜索中更好地了解用户满意度

论文标题

在开放域对话搜索中更好地了解用户满意度

Towards Better Understanding of User Satisfaction in Open-Domain Conversational Search

论文作者

Chu, Zhumin, Ai, Qingyao, Wang, Zhihong, Liu, Yiqun, Huang, Yingye, Zhang, Rui, Zhang, Min, Ma, Shaoping

论文摘要

随着对话搜索的日益普及，如何评估对话搜索系统的性能已成为IR社区的重要问题。现有关于对话搜索评估的作品可以主要分为两个流：（1）基于语义相似性（例如蓝色，流星和Bertscore）构建指标，或（2）直接使用传统搜索方法（例如NDCG，RBP和NERR）直接评估系统的响应排名性能。但是，这些方法要么忽略用户的信息需求，要么忽略了对话搜索的混合定位属性。这就提出了一个问题，即如何在对话搜索方案中准确建模用户满意度。由于明确要求用户提供满意反馈很困难，因此传统的IR研究通常依赖于Cranfield范式（即第三方注释）和用户行为建模来估计搜索中的用户满意度。但是，这两种方法的可行性和有效性尚未在会话搜索中得到充分探索。在本文中，我们从用户满意度的角度研究了对话搜索的评估。我们构建了一个新颖的对话搜索实验平台，并构建一个包含丰富注释和搜索行为数据的中国开放式对话搜索行为数据集。我们还收集了会议级别和转向级别的第三方满意度注释，以调查Cranfield范式在对话搜索方案中的可行性。实验结果表明用户满意度注释和第三方注释之间既有一致性和相当大的差异。我们还建议对话框延续或结束行为模型（DCEBM），以根据转向级信息捕获会话级用户满意度。

With the increasing popularity of conversational search, how to evaluate the performance of conversational search systems has become an important question in the IR community. Existing works on conversational search evaluation can mainly be categorized into two streams: (1) constructing metrics based on semantic similarity (e.g. BLUE, METEOR and BERTScore), or (2) directly evaluating the response ranking performance of the system using traditional search methods (e.g. nDCG, RBP and nERR). However, these methods either ignore the information need of the user or ignore the mixed-initiative property of conversational search. This raises the question of how to accurately model user satisfaction in conversational search scenarios. Since explicitly asking users to provide satisfaction feedback is difficult, traditional IR studies often rely on the Cranfield paradigm (i.e., third-party annotation) and user behavior modeling to estimate user satisfaction in search. However, the feasibility and effectiveness of these two approaches have not been fully explored in conversational search. In this paper, we dive into the evaluation of conversational search from the perspective of user satisfaction. We build a novel conversational search experimental platform and construct a Chinese open-domain conversational search behavior dataset containing rich annotations and search behavior data. We also collect third-party satisfaction annotation at the session-level and turn-level, to investigate the feasibility of the Cranfield paradigm in the conversational search scenario. Experimental results show both some consistency and considerable differences between the user satisfaction annotations and third-party annotations. We also propose dialog continuation or ending behavior models (DCEBM) to capture session-level user satisfaction based on turn-level information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题