论文标题
茶叶:用n-grams的教师
TeaForN: Teacher-Forcing with N-grams
论文作者
论文摘要
训练有教师训练的序列生成模型遭受与暴露偏见有关的问题,并且在时间段之间缺乏可怜性。我们提出的使用N-Grams(Teaforn)的教师对教师进行训练的方法,通过使用一堆经过训练的N解码器沿二级时轴解码的N解码器直接解决了这两个问题,该解码器允许基于N预测步骤进行模型参数更新。茶叶可以与广泛的解码器体系结构一起使用,需要从标准的教师装置中进行最小的修改。从经验上讲,我们表明茶叶可以提高一台机器翻译基准,WMT 2014英语 - 弗朗奇和两个新闻摘要基准,CNN/Dailymail和Gigaword。
Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps. Our proposed method, Teacher-Forcing with N-grams (TeaForN), addresses both these problems directly, through the use of a stack of N decoders trained to decode along a secondary time axis that allows model parameter updates based on N prediction steps. TeaForN can be used with a wide class of decoder architectures and requires minimal modifications from a standard teacher-forcing setup. Empirically, we show that TeaForN boosts generation quality on one Machine Translation benchmark, WMT 2014 English-French, and two News Summarization benchmarks, CNN/Dailymail and Gigaword.