论文标题
通过分类重新聚集技巧进行反向翻译的端到端培训
End-to-End Training for Back-Translation with Categorical Reparameterization Trick
论文作者
论文摘要
反向翻译(BT)是神经机器翻译(NMT)中有效的半监督学习框架。预先训练的NMT模型翻译了单语句子,并制作合成双语句子对,以训练其他NMT模型,反之亦然。在以前的工作中应用了两个NMT模型作为推理和生成模型,分别是变异自动编码器(VAE)的训练方法,这是生成模型的主流框架。但是,翻译句子的离散属性阻止了两个NMT模型之间流动的梯度信息。在本文中,我们提出了使NMT模型生成可区分句子的分类重新聚集技巧(CRT),以便VAE的训练框架可以以端到端的方式工作。我们在WMT基准数据集上进行的BT实验证明了我们所提出的CRT与Gumbel-Softmax Track相比,这是一种流行的分类变量重新聚集方法。此外,我们在多个WMT基准数据集上进行的实验表明,我们提出的端到端训练框架在BLEU得分方面不仅有效,不仅与其对应的基线相比,该框架与未以端到端的方式进行训练,而且与其他先前的BT作品相比。该代码可在网络上找到。
Back-translation (BT) is an effective semi-supervised learning framework in neural machine translation (NMT). A pre-trained NMT model translates monolingual sentences and makes synthetic bilingual sentence pairs for the training of the other NMT model, and vice versa. Understanding the two NMT models as inference and generation models, respectively, the training method of variational auto-encoder (VAE) was applied in previous works, which is a mainstream framework of generative models. However, the discrete property of translated sentences prevents gradient information from flowing between the two NMT models. In this paper, we propose the categorical reparameterization trick (CRT) that makes NMT models generate differentiable sentences so that the VAE's training framework can work in an end-to-end fashion. Our BT experiment conducted on a WMT benchmark dataset demonstrates the superiority of our proposed CRT compared to the Gumbel-softmax trick, which is a popular reparameterization method for categorical variable. Moreover, our experiments conducted on multiple WMT benchmark datasets demonstrate that our proposed end-to-end training framework is effective in terms of BLEU scores not only compared to its counterpart baseline which is not trained in an end-to-end fashion, but also compared to other previous BT works. The code is available at the web.