论文标题
汉塞尔:中国人几次和零射击实体链接基准
Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark
论文作者
论文摘要
现代实体链接(EL)系统构成了流行性偏见,但是没有数据集以英语以外的其他语言上关注尾巴和新兴实体。我们向Hansel提出了中国人的新基准,填补了非英语几次挑战和零击EL挑战的空缺。 Hansel的测试集经过人工注释和审查,并采用一种用于收集零照片EL数据集的新方法。它涵盖了新闻,社交媒体帖子和其他网络文章中的10k多种文档,Wikidata作为目标知识库。我们证明,现有的最新EL系统在Hansel上的表现不佳(R@1中的36.6%,几乎没有射击)。然后,我们建立了一个强大的基线,该基线在几个射门上得分为46.2%,在我们的数据集中零射门得分为76.6%。我们还表明,我们的基线在TAC-KBP2015中国实体链接任务上取得了竞争成果。
Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task.