论文标题
非对齐的横熵,用于非自动回旋机器翻译
Aligned Cross Entropy for Non-Autoregressive Machine Translation
论文作者
论文摘要
通过允许对整个目标序列的并行预测,非自动入学的机器翻译模型可显着加快解码的速度。但是,由于模型中缺乏自回归因素,对单词顺序进行建模更具挑战性。在训练过程中,这种困难与跨熵损失相比会更加复杂,这可能会严重惩罚单词顺序的微小变化。在本文中,我们提出对齐的交叉熵(AX)作为训练非自动回归模型的替代损耗函数。 AX使用一个可区分的动态程序根据目标令牌和模型预测之间的最佳单调比对来分配损失。基于AXE的有条件掩盖语言模型(CMLMS)的培训大大提高了主要WMT基准测试的性能,同时为非自动进展模型设定了新的最新技术。
Non-autoregressive machine translation models significantly speed up decoding by allowing for parallel prediction of the entire target sequence. However, modeling word order is more challenging due to the lack of autoregressive factors in the model. This difficultly is compounded during training with cross entropy loss, which can highly penalize small shifts in word order. In this paper, we propose aligned cross entropy (AXE) as an alternative loss function for training of non-autoregressive models. AXE uses a differentiable dynamic program to assign loss based on the best possible monotonic alignment between target tokens and model predictions. AXE-based training of conditional masked language models (CMLMs) substantially improves performance on major WMT benchmarks, while setting a new state of the art for non-autoregressive models.