硬编码的高斯注意神经机器翻译

论文标题

硬编码的高斯注意神经机器翻译

Hard-Coded Gaussian Attention for Neural Machine Translation

论文作者

You, Weiqiu, Sun, Simeng, Iyyer, Mohit

论文摘要

最近的工作质疑了变压器多头关注的重要性，以达到高转化质量。我们通过在没有任何学习参数的情况下开发“硬编码”的注意变体来进一步朝这个方向推动。令人惊讶的是，用固定的输入 - 不合理的高斯分布替换编码器中的所有学识渊博的自我注意事项头，从而最大程度地影响了四个不同语言对的BLEU得分。但是，此外，硬编码的交叉注意（将解码器连接到编码器）显着降低了BLEU，这表明它比自我注意力更重要。可以通过仅将单个学习的交叉注意力头添加到其他硬编码的变压器中来恢复大部分。总体而言，我们的结果提供了有关变压器的组件实际上很重要的洞察力，我们希望这将指导未来的工作，以开发更简单，更有效的基于注意力的模型。

Recent work has questioned the importance of the Transformer's multi-headed attention for achieving high translation quality. We push further in this direction by developing a "hard-coded" attention variant without any learned parameters. Surprisingly, replacing all learned self-attention heads in the encoder and decoder with fixed, input-agnostic Gaussian distributions minimally impacts BLEU scores across four different language pairs. However, additionally hard-coding cross attention (which connects the decoder to the encoder) significantly lowers BLEU, suggesting that it is more important than self-attention. Much of this BLEU drop can be recovered by adding just a single learned cross attention head to an otherwise hard-coded Transformer. Taken as a whole, our results offer insight into which components of the Transformer are actually important, which we hope will guide future work into the development of simpler and more efficient attention-based models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题