论文标题

DOC2DIAL:面向目标的文档对话数据集

doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset

论文作者

Feng, Song, Wan, Hui, Gunasekara, Chulaka, Patel, Siva Sankalp, Joshi, Sachindra, Lastras, Luis A.

论文摘要

我们介绍了Doc2dial,这是一个以目标对话为基础的新数据集,这些对话基于相关文档。受作者如何为指导最终用户撰写文档的启发,我们首先根据内容元素构建对话流,这些内容元素与文本部分之间的高级关系相对应,以及一节中的话语单位之间的低级关系。然后,我们向人群贡献者提出这些对话流,以创造对话性话语。该数据集包括大约4800个带注释的对话,平均14个转弯基于四个域中的480多个文档。与先前的文档接地数据集相比,该数据集涵盖了信息寻求对话中的各种对话场景。为了评估数据集的多功能性,我们介绍了多个对话建模任务和目前的基线方法。

We introduce doc2dial, a new dataset of goal-oriented dialogues that are grounded in the associated documents. Inspired by how the authors compose documents for guiding end users, we first construct dialogue flows based on the content elements that corresponds to higher-level relations across text sections as well as lower-level relations between discourse units within a section. Then we present these dialogue flows to crowd contributors to create conversational utterances. The dataset includes about 4800 annotated conversations with an average of 14 turns that are grounded in over 480 documents from four domains. Compared to the prior document-grounded dialogue datasets, this dataset covers a variety of dialogue scenes in information-seeking conversations. For evaluating the versatility of the dataset, we introduce multiple dialogue modeling tasks and present baseline approaches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源