蒸馏，适应，蒸馏：培训小型的神经机器翻译模型

论文标题

蒸馏，适应，蒸馏：培训小型的神经机器翻译模型

Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation

论文作者

Gordon, Mitchell A., Duh, Kevin

论文摘要

我们探讨了在域适应设置中训练具有序列级知识蒸馏的小型内存有效的机器翻译模型的最佳实践。尽管领域的适应性和知识蒸馏都广泛使用，但它们的相互作用却尚不清楚。我们在机器翻译中的大规模经验结果（在三个语言对和三个域）中提出两次提取以获得最佳性能：一旦使用通用域数据，然后再次使用适合的老师使用内域数据。

We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting. While both domain adaptation and knowledge distillation are widely-used, their interaction remains little understood. Our large-scale empirical results in machine translation (on three language pairs with three domains each) suggest distilling twice for best performance: once using general-domain data and again using in-domain data with an adapted teacher.

下载PDF全文

下载文献需遵守相关版权规定

论文标题