茶叶：用n-grams的教师

论文标题

茶叶：用n-grams的教师

TeaForN: Teacher-Forcing with N-grams

论文作者

Goodman, Sebastian, Ding, Nan, Soricut, Radu

论文摘要

训练有教师训练的序列生成模型遭受与暴露偏见有关的问题，并且在时间段之间缺乏可怜性。我们提出的使用N-Grams（Teaforn）的教师对教师进行训练的方法，通过使用一堆经过训练的N解码器沿二级时轴解码的N解码器直接解决了这两个问题，该解码器允许基于N预测步骤进行模型参数更新。茶叶可以与广泛的解码器体系结构一起使用，需要从标准的教师装置中进行最小的修改。从经验上讲，我们表明茶叶可以提高一台机器翻译基准，WMT 2014英语 - 弗朗奇和两个新闻摘要基准，CNN/Dailymail和Gigaword。

Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps. Our proposed method, Teacher-Forcing with N-grams (TeaForN), addresses both these problems directly, through the use of a stack of N decoders trained to decode along a secondary time axis that allows model parameter updates based on N prediction steps. TeaForN can be used with a wide class of decoder architectures and requires minimal modifications from a standard teacher-forcing setup. Empirically, we show that TeaForN boosts generation quality on one Machine Translation benchmark, WMT 2014 English-French, and two News Summarization benchmarks, CNN/Dailymail and Gigaword.

下载PDF全文

下载文献需遵守相关版权规定

论文标题