对细粒实体键入的自动嘈杂标签校正

论文标题

对细粒实体键入的自动嘈杂标签校正

Automatic Noisy Label Correction for Fine-Grained Entity Typing

论文作者

Pan, Weiran, Wei, Wei, Zhu, Feida

论文摘要

细颗粒实体键入（FET）旨在根据实体的上下文提及实体提及的适当的语义类型，这是各种实体式应用应用程序的基本任务。当前的FET系统通常在大规模弱监督/远的注释数据上建立，这些数据可能包含丰富的噪声，因此严重阻碍了FET任务的性能。尽管以前的研究在自动识别FET中的嘈杂标签方面取得了巨大的成功，但它们通常依赖于某些辅助资源，这些资源在现实世界中可能无法获得（例如，预定义的层次类型结构，人类宣布的子集）。在本文中，我们提出了一种新颖的方法，可以自动纠正没有外部资源的FET的嘈杂标签。具体而言，它首先通过根据模型的逻辑输出来估算标签的后验概率，然后通过在剩余的干净标签上训练可靠的模型来识别潜在的嘈杂标签。对两个流行基准测试的实验证明了我们方法的有效性。我们的源代码可以从https://github.com/cciiplab/denoisefet获得。

Fine-grained entity typing (FET) aims to assign proper semantic types to entity mentions according to their context, which is a fundamental task in various entity-leveraging applications. Current FET systems usually establish on large-scale weakly-supervised/distantly annotation data, which may contain abundant noise and thus severely hinder the performance of the FET task. Although previous studies have made great success in automatically identifying the noisy labels in FET, they usually rely on some auxiliary resources which may be unavailable in real-world applications (e.g. pre-defined hierarchical type structures, human-annotated subsets). In this paper, we propose a novel approach to automatically correct noisy labels for FET without external resources. Specifically, it first identifies the potentially noisy labels by estimating the posterior probability of a label being positive or negative according to the logits output by the model, and then relabel candidate noisy labels by training a robust model over the remaining clean labels. Experiments on two popular benchmarks prove the effectiveness of our method. Our source code can be obtained from https://github.com/CCIIPLab/DenoiseFET.

下载PDF全文

下载文献需遵守相关版权规定

论文标题