VECO：可变且灵活的跨语性预训练，用于语言理解和产生

论文标题

VECO：可变且灵活的跨语性预训练，用于语言理解和产生

VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation

论文作者

Luo, Fuli, Wang, Wei, Liu, Jiahao, Liu, Yijia, Bi, Bin, Huang, Songfang, Huang, Fei, Si, Luo

论文摘要

多种语言预处理的现有工作表明，通过训练多种语言的统一变压器编码器来表明跨语性转移性的潜力。但是，这项工作的大部分仅依赖于共同的词汇和双语环境来鼓励跨语言的相关性，这是松散的，并且隐含着使语言之间的上下文表示。在本文中，我们将一个交叉意见模块插入变压器编码器中，以明确构建语言之间的相互依存关系。它可以有效地避免仅在其语言上以上下文为条件的预测掩盖单词的退化。更重要的是，当对下游任务进行微调时，可以按需插入或插入交叉意见模块，从而自然地使更广泛的跨语性任务受益，从语言理解到世代。结果，所提出的跨语性模型为Xtreme基准的各种跨语性理解任务提供了新的最新最先进的结果，涵盖了文本分类，序列标签，问题回答和句子检索。对于跨语性生成任务，它还胜过WMT14英语对德语和英语对英语的所有现有跨语性模型和最先进的变压器变体，并具有高达1〜2 bleu的增长。

Existing work in multilingual pretraining has demonstrated the potential of cross-lingual transferability by training a unified Transformer encoder for multiple languages. However, much of this work only relies on the shared vocabulary and bilingual contexts to encourage the correlation across languages, which is loose and implicit for aligning the contextual representations between languages. In this paper, we plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages. It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language. More importantly, when fine-tuning on downstream tasks, the cross-attention module can be plugged in or out on-demand, thus naturally benefiting a wider range of cross-lingual tasks, from language understanding to generation. As a result, the proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark, covering text classification, sequence labeling, question answering, and sentence retrieval. For cross-lingual generation tasks, it also outperforms all existing cross-lingual models and state-of-the-art Transformer variants on WMT14 English-to-German and English-to-French translation datasets, with gains of up to 1~2 BLEU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题