论文标题
最小缺失引起的罕见但严重的神经机器翻译错误:一项关于中文和英语的实证研究
Rare but Severe Neural Machine Translation Errors Induced by Minimal Deletion: An Empirical Study on Chinese and English
论文作者
论文摘要
我们通过用基于字符的模型对源文本的最小删除来研究英语 - 英语和中文 - 英语内神经机器翻译中罕见但严重的错误的诱导。通过删除单个字符,我们可以引起严重的翻译错误。我们对这些错误进行分类,并比较删除单个字符和单词的结果。我们还研究了训练数据大小对这些最小扰动引起的病理病例的数量和类型的影响,从而发现了显着差异。我们发现,删除单词损害总体翻译得分不仅仅是删除角色,但是在删除字符时,某些错误更可能发生,而语言方向也会影响效果。
We examine the inducement of rare but severe errors in English-Chinese and Chinese-English in-domain neural machine translation by minimal deletion of the source text with character-based models. By deleting a single character, we can induce severe translation errors. We categorize these errors and compare the results of deleting single characters and single words. We also examine the effect of training data size on the number and types of pathological cases induced by these minimal perturbations, finding significant variation. We find that deleting a word hurts overall translation score more than deleting a character, but certain errors are more likely to occur when deleting characters, with language direction also influencing the effect.