变压器中的水平和垂直注意力

论文标题

变压器中的水平和垂直注意力

Horizontal and Vertical Attention in Transformers

论文作者

Yu, Litao, Zhang, Jian

论文摘要

变压器建立在多头缩放的点产物的关注和位置编码的基础上，旨在学习特征表示和令牌依赖依赖性。在这项工作中，我们专注于通过学习通过变压器中的自我发项机制来增强特征图来增强独特的表示。具体而言，我们提出了水平的关注，以在降低维度降低之前将缩放点产量注意的多头输出重新进行重量，并通过对不同通道之间的相互依赖性进行显式建模，提出垂直关注以适应性地重新校准通道特征响应。我们证明了配备了两种专注的变压器模型在不同监督的学习任务中具有很高的概括能力，并具有较小的额外计算成本开销。提出的水平和垂直注意力是高度模块化的，可以将其插入各种变压器模型中，以进一步提高性能。我们的代码在补充材料中可用。

Transformers are built upon multi-head scaled dot-product attention and positional encoding, which aim to learn the feature representations and token dependencies. In this work, we focus on enhancing the distinctive representation by learning to augment the feature maps with the self-attention mechanism in Transformers. Specifically, we propose the horizontal attention to re-weight the multi-head output of the scaled dot-product attention before dimensionality reduction, and propose the vertical attention to adaptively re-calibrate channel-wise feature responses by explicitly modelling inter-dependencies among different channels. We demonstrate the Transformer models equipped with the two attentions have a high generalization capability across different supervised learning tasks, with a very minor additional computational cost overhead. The proposed horizontal and vertical attentions are highly modular, which can be inserted into various Transformer models to further improve the performance. Our code is available in the supplementary material.

下载PDF全文

下载文献需遵守相关版权规定

论文标题