论文标题

模仿:用于搜索澄清的大规模数据收集

MIMICS: A Large-Scale Data Collection for Search Clarification

论文作者

Zamani, Hamed, Lueck, Gord, Chen, Everest, Quispe, Rodolfo, Luu, Flint, Craswell, Nick

论文摘要

由于其在搜索引擎中的应用,搜索澄清最近引起了很多关注。它也被认为是对话信息寻求系统的主要组成部分。尽管它很重要,但研究界仍然感到缺乏研究搜索澄清不同方面的大规模数据。在本文中,我们介绍了模仿,这是从Bing查询日志中采样的真实Web搜索查询的搜索澄清数据集的集合。模仿中的每个澄清都是由Bing生产算法产生的,由一个澄清的问题和最多五个候选答案组成。模拟物包含三个数据集:(1)模仿单击包含超过400k的唯一查询,它们的关联澄清窗口以及相应的汇总用户交互信号(即单击)。 (2)Mimics-clickexplore是一个探索数据,其中包括超过60k唯一查询的汇总用户交互信号,每个查询都带有多个澄清窗格。 (3)模仿手术包含超过2K唯一的真实搜索查询。该数据集中的每个查询贴贴对均由至少三个训练有素的注释者手动标记。它包含澄清问题,候选答案集的分级质量标签以及每个候选人答案的着陆结果页面。 模拟物可公开用于研究目的,因此使研究人员能够研究与搜索澄清有关的许多任务,包括澄清和选择,澄清的用户参与预测,单击澄清模型以及与搜索澄清分析用户交互。

Search clarification has recently attracted much attention due to its applications in search engines. It has also been recognized as a major component in conversational information seeking systems. Despite its importance, the research community still feels the lack of a large-scale data for studying different aspects of search clarification. In this paper, we introduce MIMICS, a collection of search clarification datasets for real web search queries sampled from the Bing query logs. Each clarification in MIMICS is generated by a Bing production algorithm and consists of a clarifying question and up to five candidate answers. MIMICS contains three datasets: (1) MIMICS-Click includes over 400k unique queries, their associated clarification panes, and the corresponding aggregated user interaction signals (i.e., clicks). (2) MIMICS-ClickExplore is an exploration data that includes aggregated user interaction signals for over 60k unique queries, each with multiple clarification panes. (3) MIMICS-Manual includes over 2k unique real search queries. Each query-clarification pair in this dataset has been manually labeled by at least three trained annotators. It contains graded quality labels for the clarifying question, the candidate answer set, and the landing result page for each candidate answer. MIMICS is publicly available for research purposes, thus enables researchers to study a number of tasks related to search clarification, including clarification generation and selection, user engagement prediction for clarification, click models for clarification, and analyzing user interactions with search clarification.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源