SynsetExpan：联合实体集扩展和同义词发现的迭代框架

论文标题

SynsetExpan：联合实体集扩展和同义词发现的迭代框架

SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery

论文作者

Shen, Jiaming, Qiu, Wenda, Shang, Jingbo, Vanni, Michelle, Ren, Xiang, Han, Jiawei

论文摘要

实体集扩展和同义词发现是两个关键的NLP任务。先前的研究将它们分开完成，而无需探索它们的相互依存关系。在这项工作中，我们假设这两个任务紧密结合，因为两个同义实体往往具有属于各种语义类别的可能性相似。这促使我们设计同步增强器，这是一个新颖的框架，使两个任务可以相互增强。 SynsetExpan使用同义词发现模型将流行实体的不频繁同义词包含在集合中，从而增强了集合的扩展召回率。同时，设定的扩展模型能够确定实体是否属于语义类别，可以生成伪训练数据，以微调同义词发现模型以提高准确性。为了促进研究这两个任务的相互作用的研究，我们通过众包创建了第一个大规模同义词增强设置扩展（SE2）数据集。 SE2数据集和先前基准测试的广泛实验证明了SynsetExpan对于实体集扩展和同义词发现任务的有效性。

Entity set expansion and synonym discovery are two critical NLP tasks. Previous studies accomplish them separately, without exploring their interdependencies. In this work, we hypothesize that these two tasks are tightly coupled because two synonymous entities tend to have similar likelihoods of belonging to various semantic classes. This motivates us to design SynSetExpan, a novel framework that enables two tasks to mutually enhance each other. SynSetExpan uses a synonym discovery model to include popular entities' infrequent synonyms into the set, which boosts the set expansion recall. Meanwhile, the set expansion model, being able to determine whether an entity belongs to a semantic class, can generate pseudo training data to fine-tune the synonym discovery model towards better accuracy. To facilitate the research on studying the interplays of these two tasks, we create the first large-scale Synonym-Enhanced Set Expansion (SE2) dataset via crowdsourcing. Extensive experiments on the SE2 dataset and previous benchmarks demonstrate the effectiveness of SynSetExpan for both entity set expansion and synonym discovery tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题