角色级别的基于变压器的神经机器翻译

论文标题

角色级别的基于变压器的神经机器翻译

Character-level Transformer-based Neural Machine Translation

论文作者

Banar, Nikolay, Daelemans, Walter, Kestemont, Mike

论文摘要

如今，神经机器翻译（NMT）通常使用字节对编码在子字级别上使用。一种有希望的替代方法着重于字符级翻译，该翻译简化了NMT中的处理管道。但是，这种方法必须考虑相对较长的序列，从而使训练过程过于昂贵。在本文中，我们讨论了一种基于变压器的新型方法，我们在子字和角色级别以及先前开发的角色级模型上以速度和质量与变压器进行比较。我们通过WMT'15：De-en，cs-en，fi-en和ru-en评估了4个语言对的模型。提出的新型建筑可以在单个GPU上进行训练，并且比角色级变压器快34％。尽管如此，获得的结果至少与之相当。此外，我们提出的模型在fi-en中的表现优于子词级模型，并在CS-EN中显示出紧密的结果。为了刺激该领域的进一步研究并使用子词级NMT缩小差距，我们公开提供所有代码和模型。

Neural machine translation (NMT) is nowadays commonly applied at the subword level, using byte-pair encoding. A promising alternative approach focuses on character-level translation, which simplifies processing pipelines in NMT considerably. This approach, however, must consider relatively longer sequences, rendering the training process prohibitively expensive. In this paper, we discuss a novel, Transformer-based approach, that we compare, both in speed and in quality to the Transformer at subword and character levels, as well as previously developed character-level models. We evaluate our models on 4 language pairs from WMT'15: DE-EN, CS-EN, FI-EN and RU-EN. The proposed novel architecture can be trained on a single GPU and is 34% percent faster than the character-level Transformer; still, the obtained results are at least on par with it. In addition, our proposed model outperforms the subword-level model in FI-EN and shows close results in CS-EN. To stimulate further research in this area and close the gap with subword-level NMT, we make all our code and models publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题