Kronecker注意网络

论文标题

Kronecker注意网络

Kronecker Attention Networks

论文作者

Gao, Hongyang, Wang, Zhengyang, Ji, Shuiwang

论文摘要

注意操作员已应用于1-D数据，例如文本和高阶数据，例如图像和视频。在高阶数据上使用注意运算符需要将空间或时空维度扁平化为矢量，该维度假定遵循多元正态分布。这不仅会对计算资源产生过多的要求，而且还无法保存数据中的结构。在这项工作中，我们建议通过假设数据遵循矩阵变化的正常分布来避免平坦。基于这种新观点，我们开发了直接在高阶张量数据上运行的Kronecker注意运营商（KAOS）。更重要的是，拟议的KAOS导致计算资源的大幅减少。实验结果表明，我们的方法将所需的计算资源的数量减少了数百倍，并具有更大的因素，用于更高维度和高阶数据。结果还表明，与KAOS的网络毫不关心的是胜过模型，同时可以作为具有原始注意运营商的竞争性能。

Attention operators have been applied on both 1-D data like texts and higher-order data such as images and videos. Use of attention operators on high-order data requires flattening of the spatial or spatial-temporal dimensions into a vector, which is assumed to follow a multivariate normal distribution. This not only incurs excessive requirements on computational resources, but also fails to preserve structures in data. In this work, we propose to avoid flattening by assuming the data follow matrix-variate normal distributions. Based on this new view, we develop Kronecker attention operators (KAOs) that operate on high-order tensor data directly. More importantly, the proposed KAOs lead to dramatic reductions in computational resources. Experimental results show that our methods reduce the amount of required computational resources by a factor of hundreds, with larger factors for higher-dimensional and higher-order data. Results also show that networks with KAOs outperform models without attention, while achieving competitive performance as those with original attention operators.

下载PDF全文

下载文献需遵守相关版权规定

论文标题