论文标题

部分可观测时空混沌系统的无模型预测

Entity Tagging: Extracting Entities in Text Without Mention Supervision

论文作者

Du, Christina, Popat, Kashyap, Martin, Louis, Petroni, Fabio

论文摘要

对于各种应用程序,对文本中所有实体的检测和歧义都是至关重要的任务。该问题的典型表述涉及两个阶段:检测提及边界并将所有提及与知识库联系起来。长期以来,提及检测已被视为在文本中提取所有实体的必要步骤,即使有关提及跨度的信息被一些仅关注一组提取实体的某些下游应用程序忽略了。在本文中,我们表明,在这种情况下,发现边界的检测不会带来任何可观的绩效提高,因此可以跳过。为了进行我们的分析,我们提出了对问题的“实体标记”表述,其中纯粹在未考虑提及的情况下纯粹在提取的实体上评估了模型。我们将最新的提及的实体链接解决方案与GET进行了比较,该解决方案是一个提及的序列序列到序列模型,该模型仅输出给定输入上下文的歧义实体列表。我们发现,当在多个基准的完全和部分注释的数据集上接受培训时,这些模型实现了可比性的性能,这表明Get可以在没有明确提及边界监督的情况下提取具有强大绩效的势力势力的实体。

Detection and disambiguation of all entities in text is a crucial task for a wide range of applications. The typical formulation of the problem involves two stages: detect mention boundaries and link all mentions to a knowledge base. For a long time, mention detection has been considered as a necessary step for extracting all entities in a piece of text, even if the information about mention spans is ignored by some downstream applications that merely focus on the set of extracted entities. In this paper we show that, in such cases, detection of mention boundaries does not bring any considerable performance gain in extracting entities, and therefore can be skipped. To conduct our analysis, we propose an "Entity Tagging" formulation of the problem, where models are evaluated purely on the set of extracted entities without considering mentions. We compare a state-of-the-art mention-aware entity linking solution against GET, a mention-agnostic sequence-to-sequence model that simply outputs a list of disambiguated entities given an input context. We find that these models achieve comparable performance when trained both on a fully and partially annotated dataset across multiple benchmarks, demonstrating that GET can extract disambiguated entities with strong performance without explicit mention boundaries supervision.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源