具有高质量翻译的培训语料库的跨语性语义角色标签

论文标题

具有高质量翻译的培训语料库的跨语性语义角色标签

Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus

论文作者

Fei, Hao, Zhang, Meishan, Ji, Donghong

论文摘要

研究的许多努力致力于语义角色标签（SRL），这对于自然语言理解至关重要。当大规模的语料库可用于资源丰富的语言（例如英语）时，有监督的方法令人印象深刻。对于没有带注释的SRL数据集的低资源语言，获得竞争性表演仍然具有挑战性。跨语言SRL是解决该问题的一种有前途的方法，在模型传输和注释投影的帮助下，它取得了巨大的进步。在本文中，我们提出了一种基于语料库翻译的新颖替代方案，从源金标准SRL注释中为目标语言构建了高质量的培训数据集。通用命题库的实验结果表明，基于翻译的方法非常有效，自动伪数据集可以显着改善目标语言SRL性能。

Many efforts of research are devoted to semantic role labeling (SRL) which is crucial for natural language understanding. Supervised approaches have achieved impressing performances when large-scale corpora are available for resource-rich languages such as English. While for the low-resource languages with no annotated SRL dataset, it is still challenging to obtain competitive performances. Cross-lingual SRL is one promising way to address the problem, which has achieved great advances with the help of model transferring and annotation projection. In this paper, we propose a novel alternative based on corpus translation, constructing high-quality training datasets for the target languages from the source gold-standard SRL annotations. Experimental results on Universal Proposition Bank show that the translation-based method is highly effective, and the automatic pseudo datasets can improve the target-language SRL performances significantly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题