论文标题

汉塞尔:中国人几次和零射击实体链接基准

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

论文作者

Xu, Zhenran, Shan, Zifei, Li, Yuxin, Hu, Baotian, Qin, Bing

论文摘要

现代实体链接(EL)系统构成了流行性偏见,但是没有数据集以英语以外的其他语言上关注尾巴和新兴实体。我们向Hansel提出了中国人的新基准,填补了非英语几次挑战和零击EL挑战的空缺。 Hansel的测试集经过人工注释和审查,并采用一种用于收集零照片EL数据集的新方法。它涵盖了新闻,社交媒体帖子和其他网络文章中的10k多种文档,Wikidata作为目标知识库。我们证明,现有的最新EL系统在Hansel上的表现不佳(R@1中的36.6%,几乎没有射击)。然后,我们建立了一个强大的基线,该基线在几个射门上得分为46.2%,在我们的数据集中零射门得分为76.6%。我们还表明,我们的基线在TAC-KBP2015中国实体链接任务上取得了竞争成果。

Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源