具有傅立叶整体关注的变压器

论文标题

具有傅立叶整体关注的变压器

Transformer with Fourier Integral Attentions

论文作者

Nguyen, Tan, Pham, Minh, Nguyen, Tam, Nguyen, Khai, Osher, Stanley J., Ho, Nhat

论文摘要

多头注意力使变形金刚的最新成功获得了最新的成功，这是在序列建模及其他方面取得了显着成功的最先进模型。这些注意机制计算了查询和键之间的成对点产物，这是由于使用不当的高斯内核而导致的，并假设查询遵循高斯分布的混合物。不能保证这个假设在实践中是有效的。作为回应，我们首先将注意力的注意力解释为非参数内核回归。然后，我们提出了傅立叶形式，这是一种新的变压器，其中点 - 产物核被新颖的广义傅立叶积分内核所取代。与点产品内核不同，我们需要选择一个良好的协方差矩阵来捕获数据功能的依赖性，广义傅立叶积分内核可以自动捕获这种依赖性并消除调整协方差矩阵的需求。从理论上讲，我们提出的傅立叶积分内核可以有效地近似任何密钥和查询分布。与传统的变压器引起了点 - 产物的关注，傅立叶形式可以提高准确性并降低注意力头之间的冗余。我们从经验上证实了傅立叶构成在基线变压器（包括语言建模和图像分类）的各种实用应用中的优势。

Multi-head attention empowers the recent success of transformers, the state-of-the-art models that have achieved remarkable success in sequence modeling and beyond. These attention mechanisms compute the pairwise dot products between the queries and keys, which results from the use of unnormalized Gaussian kernels with the assumption that the queries follow a mixture of Gaussian distribution. There is no guarantee that this assumption is valid in practice. In response, we first interpret attention in transformers as a nonparametric kernel regression. We then propose the FourierFormer, a new class of transformers in which the dot-product kernels are replaced by the novel generalized Fourier integral kernels. Different from the dot-product kernels, where we need to choose a good covariance matrix to capture the dependency of the features of data, the generalized Fourier integral kernels can automatically capture such dependency and remove the need to tune the covariance matrix. We theoretically prove that our proposed Fourier integral kernels can efficiently approximate any key and query distributions. Compared to the conventional transformers with dot-product attention, FourierFormers attain better accuracy and reduce the redundancy between attention heads. We empirically corroborate the advantages of FourierFormers over the baseline transformers in a variety of practical applications including language modeling and image classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题