链接到Wikidata的多语言事件

论文标题

链接到Wikidata的多语言事件

Multilingual Event Linking to Wikidata

论文作者

Pratapa, Adithya, Gupta, Rishubh, Mitamura, Teruko

论文摘要

我们提出了将事件链接到知识库的多语言链接的任务。我们会自动为此任务编译一个大规模数据集，其中包括180万个涉及Wikidata超过10.9k事件的44种语言的提及。我们提出了事件链接任务的两个变体：1）多语言，其中事件描述来自与提及的语言相同的语言，以及2）跨语言，所有事件描述均以英语为单位。在提出的两个任务上，我们比较了包括BM25+（LV和Zhai，2011年）在内的多个事件链接系统，以及Blink（Wu等，2020）的Biencoder和Crossencoder体系结构的多语言改编。在我们对两个任务变体的实验中，我们发现Biencoder和CrossCoder模型的表现都显着胜过BM25+基线。我们的结果还表明，跨语言任务通常比多语言任务更具挑战性。为了测试所提出的链接系统的跨域概括，我们还创建了一个基于Wikinews的评估集。我们提出了定性分析，突出了所提出的数据集捕获的各个方面，包括需要在上下文上进行时间推理并解决跨语言的各种事件描述。

We present a task of multilingual linking of events to a knowledge base. We automatically compile a large-scale dataset for this task, comprising of 1.8M mentions across 44 languages referring to over 10.9K events from Wikidata. We propose two variants of the event linking task: 1) multilingual, where event descriptions are from the same language as the mention, and 2) crosslingual, where all event descriptions are in English. On the two proposed tasks, we compare multiple event linking systems including BM25+ (Lv and Zhai, 2011) and multilingual adaptations of the biencoder and crossencoder architectures from BLINK (Wu et al., 2020). In our experiments on the two task variants, we find both biencoder and crossencoder models significantly outperform the BM25+ baseline. Our results also indicate that the crosslingual task is in general more challenging than the multilingual task. To test the out-of-domain generalization of the proposed linking systems, we additionally create a Wikinews-based evaluation set. We present qualitative analysis highlighting various aspects captured by the proposed dataset, including the need for temporal reasoning over context and tackling diverse event descriptions across languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题