论文标题
基于变压器的机器翻译中的固定编码器自我发项模式
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
论文作者
论文摘要
基于变压器的模型为神经机器翻译带来了根本性的变化。变压器体系结构的一个关键特征是所谓的多头注意机制,该机制使模型可以同时集中在输入的不同部分。但是,最近的作品表明,大多数注意力负责人都学到了简单,通常是多余的位置模式。在本文中,我们建议将每个编码层的一个注意力头除外,简单的固定 - 非可行性 - 细心模式,这些模式仅基于位置,不需要任何外部知识。我们使用不同数据大小和多种语言对的实验表明,在训练时间将注意力头固定在变压器的编码器侧不会影响翻译质量,甚至在低资源场景中最多增加了3分。
Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different parts of the input. However, recent works have shown that most attention heads learn simple, and often redundant, positional patterns. In this paper, we propose to replace all but one attention head of each encoder layer with simple fixed -- non-learnable -- attentive patterns that are solely based on position and do not require any external knowledge. Our experiments with different data sizes and multiple language pairs show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality and even increases BLEU scores by up to 3 points in low-resource scenarios.