汉塞尔：中国人几次和零射击实体链接基准

论文标题

汉塞尔：中国人几次和零射击实体链接基准

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

论文作者

Xu, Zhenran, Shan, Zifei, Li, Yuxin, Hu, Baotian, Qin, Bing

论文摘要

现代实体链接（EL）系统构成了流行性偏见，但是没有数据集以英语以外的其他语言上关注尾巴和新兴实体。我们向Hansel提出了中国人的新基准，填补了非英语几次挑战和零击EL挑战的空缺。 Hansel的测试集经过人工注释和审查，并采用一种用于收集零照片EL数据集的新方法。它涵盖了新闻，社交媒体帖子和其他网络文章中的10k多种文档，Wikidata作为目标知识库。我们证明，现有的最新EL系统在Hansel上的表现不佳（R@1中的36.6％，几乎没有射击）。然后，我们建立了一个强大的基线，该基线在几个射门上得分为46.2％，在我们的数据集中零射门得分为76.6％。我们还表明，我们的基线在TAC-KBP2015中国实体链接任务上取得了竞争成果。

Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题