论文标题
具有高质量翻译的培训语料库的跨语性语义角色标签
Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus
论文作者
论文摘要
研究的许多努力致力于语义角色标签(SRL),这对于自然语言理解至关重要。当大规模的语料库可用于资源丰富的语言(例如英语)时,有监督的方法令人印象深刻。对于没有带注释的SRL数据集的低资源语言,获得竞争性表演仍然具有挑战性。跨语言SRL是解决该问题的一种有前途的方法,在模型传输和注释投影的帮助下,它取得了巨大的进步。在本文中,我们提出了一种基于语料库翻译的新颖替代方案,从源金标准SRL注释中为目标语言构建了高质量的培训数据集。通用命题库的实验结果表明,基于翻译的方法非常有效,自动伪数据集可以显着改善目标语言SRL性能。
Many efforts of research are devoted to semantic role labeling (SRL) which is crucial for natural language understanding. Supervised approaches have achieved impressing performances when large-scale corpora are available for resource-rich languages such as English. While for the low-resource languages with no annotated SRL dataset, it is still challenging to obtain competitive performances. Cross-lingual SRL is one promising way to address the problem, which has achieved great advances with the help of model transferring and annotation projection. In this paper, we propose a novel alternative based on corpus translation, constructing high-quality training datasets for the target languages from the source gold-standard SRL annotations. Experimental results on Universal Proposition Bank show that the translation-based method is highly effective, and the automatic pseudo datasets can improve the target-language SRL performances significantly.