论文标题

迭代域修复的背面翻译

Iterative Domain-Repaired Back-Translation

论文作者

Wei, Hao-Ran, Zhang, Zhirui, Chen, Boxing, Luo, Weihua

论文摘要

在本文中,我们专注于具有低资源的特定领域的翻译,其中存在域平行的语料库很少或不存在。对于这种情况,一种常见且有效的策略是使用背面翻译方法利用内域单语言数据。但是,合成的并行数据非常嘈杂,因为它们是由不完美的室外系统产生的,从而导致域适应性较差。为了解决这个问题,我们提出了一个新型的迭代域修复式反向翻译框架,该框架引入了域修复(DR)模型以完善合成双语数据中的翻译。为此,我们通过翻译单语句子来构造DR模型培训的相应数据,然后设计统一的训练框架以优化配对的DR和NMT模型。在特定域和从一般域到特定域之间适应NMT模型的实验证明了我们提出的方法的有效性,在未适应的模型和反向翻译方面平均实现了15.79和4.47 BLEU的改进。

In this paper, we focus on the domain-specific translation with low resources, where in-domain parallel corpora are scarce or nonexistent. One common and effective strategy for this case is exploiting in-domain monolingual data with the back-translation method. However, the synthetic parallel data is very noisy because they are generated by imperfect out-of-domain systems, resulting in the poor performance of domain adaptation. To address this issue, we propose a novel iterative domain-repaired back-translation framework, which introduces the Domain-Repair (DR) model to refine translations in synthetic bilingual data. To this end, we construct corresponding data for the DR model training by round-trip translating the monolingual sentences, and then design the unified training framework to optimize paired DR and NMT models jointly. Experiments on adapting NMT models between specific domains and from the general domain to specific domains demonstrate the effectiveness of our proposed approach, achieving 15.79 and 4.47 BLEU improvements on average over unadapted models and back-translation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源