有效的机器翻译域适应

论文标题

有效的机器翻译域适应

Efficient Machine Translation Domain Adaptation

论文作者

Martins, Pedro Henrique, Marinho, Zita, Martins, André F. T.

论文摘要

机器翻译模型在翻译室外文本时挣扎，这使得域适应成为至关重要的话题。但是，大多数域适应方法都集中在微调或培训每个新领域的整个模型或一部分，这可能是昂贵的。另一方面，已显示半参数模型通过从域内数据存储中检索示例成功地执行了域的适应性（Khandelwal等，2021）。但是，这些检索仪的模型的缺点是它们往往较慢。在本文中，我们探索了几种加快邻居机器翻译的方法。我们适应了He等人最近提出的方法。（2021）对于语言建模，并引入了一种简单但有效的缓存策略，该策略避免在以前看到类似上下文时进行检索。几个域的翻译质量和运行时间显示了拟议解决方案的有效性。

Machine translation models struggle when translating out-of-domain text, which makes domain adaptation a topic of critical importance. However, most domain adaptation methods focus on fine-tuning or training the entire or part of the model on every new domain, which can be costly. On the other hand, semi-parametric models have been shown to successfully perform domain adaptation by retrieving examples from an in-domain datastore (Khandelwal et al., 2021). A drawback of these retrieval-augmented models, however, is that they tend to be substantially slower. In this paper, we explore several approaches to speed up nearest neighbor machine translation. We adapt the methods recently proposed by He et al. (2021) for language modeling, and introduce a simple but effective caching strategy that avoids performing retrieval when similar contexts have been seen before. Translation quality and runtimes for several domains show the effectiveness of the proposed solutions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题