论文标题
注意不仅是一个重量:分析具有向量规范的变压器
Attention is Not Only a Weight: Analyzing Transformers with Vector Norms
论文作者
论文摘要
注意是变形金刚的关键组成部分,这些组成部分最近在自然语言处理中取得了巨大的成功。因此,正在广泛研究注意力以研究变形金刚的各种语言能力,重点是分析注意力重量和特定语言现象之间的相似之处。本文表明,仅注意力权重只是确定注意力输出的两个因素之一,并提出了一个基于规范的分析,该分析纳入了第二个因素,即变换后载体的规范。我们基于规范的BERT分析和基于变压器的神经机器翻译系统的发现包括:(i)与以前的研究相反,BERT对特殊令牌的关注不大,并且(ii)可以从变形金刚的注意机制中提取合理的单词对准。这些发现为变压器的内部工作提供了见解。
Attention is a key component of Transformers, which have recently achieved considerable success in natural language processing. Hence, attention is being extensively studied to investigate various linguistic capabilities of Transformers, focusing on analyzing the parallels between attention weights and specific linguistic phenomena. This paper shows that attention weights alone are only one of the two factors that determine the output of attention and proposes a norm-based analysis that incorporates the second factor, the norm of the transformed input vectors. The findings of our norm-based analyses of BERT and a Transformer-based neural machine translation system include the following: (i) contrary to previous studies, BERT pays poor attention to special tokens, and (ii) reasonable word alignment can be extracted from attention mechanisms of Transformer. These findings provide insights into the inner workings of Transformers.