论文标题
基于双语词典的神经机器不使用平行句子
Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences
论文作者
论文摘要
在本文中,我们提出了一项新的机器翻译任务(MT),该任务基于没有平行句子,但可以参考地面双语词典。由单语言者学习通过查找双语词典进行翻译的能力的动机,我们提出了任务,以查看MT系统可以使用双语词典和大规模单语言语料库获得多少潜在的潜在,而独立于并行句子。我们建议(AT)锚定培训来解决任务。 AT使用双语词典来建立锚定点,以缩小源语言和目标语言之间的差距。各种语言对的实验表明,我们的方法明显优于各种基线,包括基于字典的单词逐字翻译,词典审议的跨语言嵌入式嵌入转换和无监督的MT。在遥远的语言对上,不受监督的MT表现良好,表现出色,可以实现与受到超过400万平行句子训练的受监督SMT相当的表演。
In this paper, we propose a new task of machine translation (MT), which is based on no parallel sentences but can refer to a ground-truth bilingual dictionary. Motivated by the ability of a monolingual speaker learning to translate via looking up the bilingual dictionary, we propose the task to see how much potential an MT system can attain using the bilingual dictionary and large scale monolingual corpora, while is independent on parallel sentences. We propose anchored training (AT) to tackle the task. AT uses the bilingual dictionary to establish anchoring points for closing the gap between source language and target language. Experiments on various language pairs show that our approaches are significantly better than various baselines, including dictionary-based word-by-word translation, dictionary-supervised cross-lingual word embedding transformation, and unsupervised MT. On distant language pairs that are hard for unsupervised MT to perform well, AT performs remarkably better, achieving performances comparable to supervised SMT trained on more than 4M parallel sentences.