论文标题
通过图像解决歧义:改进的多模式的翻译和对比度评估
Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation
论文作者
论文摘要
机器翻译(MT)的主要挑战之一是模棱两可,在某些情况下可以通过诸如图像之类的上下文来解决。但是,多模式MT(MMT)的最新工作表明,从图像获得改进是具有挑战性的,不仅受到建立有效的跨模式表示的困难,而且还因为缺乏特定的评估和培训数据而受到限制。我们提出了一种基于强大文本MT模型的新型MMT方法,该方法使用神经适配器,一种新颖的引导自我发项机制,并在视觉条件掩盖和MMT上共同训练。我们还介绍了通勤,这是模棱两可句子及其可能的翻译的对比多语言多模式翻译评估集,并伴随着歧义与每次翻译相对应的图像。与标准英语对英语,英语对英语和英语至方格的基准相比,我们的方法获得了竞争性的结果,并且在我们的对比测试集上的差距很大。我们的代码和通勤是免费的。
One of the major challenges of machine translation (MT) is ambiguity, which can in some cases be resolved by accompanying context such as images. However, recent work in multimodal MT (MMT) has shown that obtaining improvements from images is challenging, limited not only by the difficulty of building effective cross-modal representations, but also by the lack of specific evaluation and training data. We present a new MMT approach based on a strong text-only MT model, which uses neural adapters, a novel guided self-attention mechanism and which is jointly trained on both visually-conditioned masking and MMT. We also introduce CoMMuTE, a Contrastive Multilingual Multimodal Translation Evaluation set of ambiguous sentences and their possible translations, accompanied by disambiguating images corresponding to each translation. Our approach obtains competitive results compared to strong text-only models on standard English-to-French, English-to-German and English-to-Czech benchmarks and outperforms baselines and state-of-the-art MMT systems by a large margin on our contrastive test set. Our code and CoMMuTE are freely available.