TURL：通过表示学习的桌子理解

论文标题

TURL：通过表示学习的桌子理解

TURL: Table Understanding through Representation Learning

论文作者

Deng, Xiang, Sun, Huan, Lees, Alyssa, Wu, You, Yu, Cong

论文摘要

网上商店上的关系表有很多知识。由于这样的桌子的财富，在理解领域的各种任务上取得了巨大进展。但是，现有的工作通常依赖于设计特定于任务的特定功能和模型体系结构。在本文中，我们提出了Turl，这是一个新颖的框架，将预培训/微调范式引入了关系网络表。在预训练期间，我们的框架以无监督的方式学习了关于关系表的深层背景化表示。它具有预训练表示的通用模型设计可以应用于具有最小特定于任务的微调的各种任务。具体而言，我们提出了一个结构感知的变压器编码器，以模拟关系表的行结构，并提出一个新的蒙版实体恢复（MER）目标，以预训练以捕获大型未标记数据中的语义和知识。我们系统地评估了TURL的基准，该基准由6个不同的任务组成，用于桌子理解（例如，关系提取，细胞填充）。我们表明，TURL在几乎所有情况下都可以很好地概括所有任务，并且在几乎所有情况下都大大优于现有方法。

Relational tables on the Web store a vast amount of knowledge. Owing to the wealth of such tables, there has been tremendous progress on a variety of tasks in the area of table understanding. However, existing work generally relies on heavily-engineered task-specific features and model architectures. In this paper, we present TURL, a novel framework that introduces the pre-training/fine-tuning paradigm to relational Web tables. During pre-training, our framework learns deep contextualized representations on relational tables in an unsupervised manner. Its universal model design with pre-trained representations can be applied to a wide range of tasks with minimal task-specific fine-tuning. Specifically, we propose a structure-aware Transformer encoder to model the row-column structure of relational tables, and present a new Masked Entity Recovery (MER) objective for pre-training to capture the semantics and knowledge in large-scale unlabeled data. We systematically evaluate TURL with a benchmark consisting of 6 different tasks for table understanding (e.g., relation extraction, cell filling). We show that TURL generalizes well to all tasks and substantially outperforms existing methods in almost all instances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题