使用知识蒸馏构建多域神经机器翻译模型

论文标题

使用知识蒸馏构建多域神经机器翻译模型

Building a Multi-domain Neural Machine Translation Model using Knowledge Distillation

论文作者

Mghabbar, Idriss, Ratnamogan, Pirashanth

论文摘要

缺乏专业数据使构建多域神经机器翻译工具具有挑战性。尽管涉及低资源语言的新兴文献开始显示出令人鼓舞的结果，但大多数最先进的模型都使用了数百万个句子。如今，大多数多域适应技术都是基于不适用于现实世界应用的复杂且复杂的体系结构。到目前为止，尚无可扩展方法比简单而有效的混合结构进行更好的执行方法，即使用所有专用数据和通用数据的混合使用通用模型。在本文中，我们提出了一条新的培训管道，其中知识蒸馏和多个专业教师允许我们在不增加推理时增加新成本的模型有效地捕获模型。我们的实验表明，我们的训练管道允许在BLEU中提高2、3和4个域的配置中的多域翻译的性能，而不是2分。

Lack of specialized data makes building a multi-domain neural machine translation tool challenging. Although emerging literature dealing with low resource languages starts to show promising results, most state-of-the-art models used millions of sentences. Today, the majority of multi-domain adaptation techniques are based on complex and sophisticated architectures that are not adapted for real-world applications. So far, no scalable method is performing better than the simple yet effective mixed-finetuning, i.e finetuning a generic model with a mix of all specialized data and generic data. In this paper, we propose a new training pipeline where knowledge distillation and multiple specialized teachers allow us to efficiently finetune a model without adding new costs at inference time. Our experiments demonstrated that our training pipeline allows improving the performance of multi-domain translation over finetuning in configurations with 2, 3, and 4 domains by up to 2 points in BLEU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题