M3P：通过多语言多语言多模式预训练来学习通用表示

论文标题

M3P：通过多语言多语言多模式预训练来学习通用表示

M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training

论文作者

Ni, Minheng, Huang, Haoyang, Su, Lin, Cui, Edward, Bharti, Taroon, Wang, Lijuan, Gao, Jianfeng, Zhang, Dongdong, Duan, Nan

论文摘要

我们提出了M3P，这是一种多语言多模式预训练的模型，该模型将多语言预训练和多模式预训练结合到统一的框架中，通过多任务预训练。我们的目标是学习可以映射对象的通用表示形式，该对象以不同语言表达的不同方式或文本发生在一个通用的语义空间中。此外，为了明确鼓励图像和非英语语言之间的细粒度对齐，我们还建议通过代码切换策略组合多模式的代码开关培训（MCT），以结合单语训练和多模式预训练。实验是在两个基准数据集（包括Mscoco和Multi30k）的多语言图像检索任务上进行的。 M3P可以为非英语语言的英语和新的最先进的结果取得可比的结果。

We present M3P, a Multitask Multilingual Multimodal Pre-trained model that combines multilingual pre-training and multimodal pre-training into a unified framework via multitask pre-training. Our goal is to learn universal representations that can map objects occurred in different modalities or texts expressed in different languages into a common semantic space. In addition, to explicitly encourage fine-grained alignment between images and non-English languages, we also propose Multimodal Code-switched Training (MCT) to combine monolingual pre-training and multimodal pre-training via a code-switch strategy. Experiments are performed on the multilingual image retrieval task across two benchmark datasets, including MSCOCO and Multi30K. M3P can achieve comparable results for English and new state-of-the-art results for non-English languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题