论文标题

通过自然语言推断间接监督的超细实体打字

Ultra-fine Entity Typing with Indirect Supervision from Natural Language Inference

论文作者

Li, Bangzheng, Yin, Wenpeng, Chen, Muhao

论文摘要

Ultra-Fine实体键入(UFET)的任务旨在预测描述句子中提到的适当类型实体类型的多样和自由形式的单词或短语。此任务的主要挑战在于每种类型的大量类型和带注释的数据的稀缺性。现有系统将任务作为多路分类问题制定,并直接或远距离监督分类器训练。这导致了两个问题:(i)分类器不会捕获类型语义,因为类型通常会转换为指数; (ii)以这种方式开发的系统仅限于在预定义的类型集中进行预测,并且通常缺乏对培训中很少见或看不见的类型的推广。这项工作提出了Lite,这是一种新的方法,该方法将实体键入为自然语言推断(NLI)问题,使用(i)(i)(i)NLI的间接监督来推断类型的信息有意义地表示为文本假设,并减轻数据稀缺性问题,并减轻(ii)避免使用类型设置的学习目标。实验表明,在有限的培训数据中,Lite在UFET任务上获得了最先进的性能。此外,Lite不仅可以在其他细粒度实体上打字基准测试,更重要的是,预先训练的Lite系统在包含未见类型的新数据上效果很好。

The task of ultra-fine entity typing (UFET) seeks to predict diverse and free-form words or phrases that describe the appropriate types of entities mentioned in sentences. A key challenge for this task lies in the large amount of types and the scarcity of annotated data per type. Existing systems formulate the task as a multi-way classification problem and train directly or distantly supervised classifiers. This causes two issues: (i) the classifiers do not capture the type semantics since types are often converted into indices; (ii) systems developed in this way are limited to predicting within a pre-defined type set, and often fall short of generalizing to types that are rarely seen or unseen in training. This work presents LITE, a new approach that formulates entity typing as a natural language inference (NLI) problem, making use of (i) the indirect supervision from NLI to infer type information meaningfully represented as textual hypotheses and alleviate the data scarcity issue, as well as (ii) a learning-to-rank objective to avoid the pre-defining of a type set. Experiments show that, with limited training data, LITE obtains state-of-the-art performance on the UFET task. In addition, LITE demonstrates its strong generalizability, by not only yielding best results on other fine-grained entity typing benchmarks, more importantly, a pre-trained LITE system works well on new data containing unseen types.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源