论文标题

事件指导的多语言关系学习

Event Guided Denoising for Multilingual Relation Learning

论文作者

Ananthram, Amith, Allaway, Emily, McKeown, Kathleen

论文摘要

通用关系提取最近由于Soares等人的大量数据密集的远处监督技术而部分看到了一部分。 (2019年)在许多基准中产生最先进的结果。在这项工作中,我们提出了一种方法,用于收集高质量的培训数据,以从未标记的文本中提取相关性,从而在培训成本的一小部分中获得了零拍的几乎恢复性和很少的射击结果。我们的方法利用了日期标记的新闻文章的可预测的分布结构来构建一个deno的语料库 - 提取过程过滤了低质量的示例。我们表明,在此语料库中训练的较小的多语言编码器与当前的最新面积(当两者都几乎没有进行微调)上的较少镜头和标准关系基准在英语和西班牙语上的表现相当(尽管使用了较少的示例(50k vs. 3亿+3亿+)。

General purpose relation extraction has recently seen considerable gains in part due to a massively data-intensive distant supervision technique from Soares et al. (2019) that produces state-of-the-art results across many benchmarks. In this work, we present a methodology for collecting high quality training data for relation extraction from unlabeled text that achieves a near-recreation of their zero-shot and few-shot results at a fraction of the training cost. Our approach exploits the predictable distributional structure of date-marked news articles to build a denoised corpus -- the extraction process filters out low quality examples. We show that a smaller multilingual encoder trained on this corpus performs comparably to the current state-of-the-art (when both receive little to no fine-tuning) on few-shot and standard relation benchmarks in English and Spanish despite using many fewer examples (50k vs. 300mil+).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源