一个用于跟踪开放域中程序文本中实体的数据集

论文标题

一个用于跟踪开放域中程序文本中实体的数据集

A Dataset for Tracking Entities in Open Domain Procedural Text

论文作者

Tandon, Niket, Sakaguchi, Keisuke, Mishra, Bhavana Dalvi, Rajagopal, Dheeraj, Clark, Peter, Guerquin, Michal, Richardson, Kyle, Hovy, Eduard

论文摘要

我们介绍了第一个数据集，用于通过使用无限制的（开放）词汇来跟踪来自任意域的过程文本中的状态变化。例如，在描述用土豆清除雾除雾的文本中，汽车窗可能在雾，粘性，不透明和清晰之间过渡。此任务的先前表述提供了所涉及的文本和实体，并询问这些实体如何仅需一组预定义的属性集（例如，位置），从而限制了其忠诚度。我们的解决方案是一种新的任务公式，其中仅给定程序文本作为输入，该任务是为每个步骤生成一组状态更改元组（实体，交易，预定状态，态度，后期），其中实体，属性和状态值必须从开放的词汇中预测。使用众包，我们创建了OpenPI1，这是高质量的（由人类和完全审查的91.5％的覆盖率），以及由Wikihow.com的810个过程现实世界中的810个Prosing facteralliald段落中的4,050个句子组成29,928个州变化的大型数据集。该任务上的当前最新一代模型基于BLEU指标，可实现16.1％的F1，为新颖的模型体系结构留出了足够的空间。

We present the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using potatoes, a car window may transition between being foggy, sticky,opaque, and clear. Previous formulations of this task provide the text and entities involved,and ask how those entities change for just a small, pre-defined set of attributes (e.g., location), limiting their fidelity. Our solution is a new task formulation where given just a procedural text as input, the task is to generate a set of state change tuples(entity, at-tribute, before-state, after-state)for each step,where the entity, attribute, and state values must be predicted from an open vocabulary. Using crowdsourcing, we create OPENPI1, a high-quality (91.5% coverage as judged by humans and completely vetted), and large-scale dataset comprising 29,928 state changes over 4,050 sentences from 810 procedural real-world paragraphs from WikiHow.com. A current state-of-the-art generation model on this task achieves 16.1% F1 based on BLEU metric, leaving enough room for novel model architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题