论文标题

视觉关注来自复发的稀疏重建

Visual Attention Emerges from Recurrent Sparse Reconstruction

论文作者

Shi, Baifeng, Song, Yale, Joshi, Neel, Darrell, Trevor, Wang, Xin

论文摘要

视觉注意力有助于在人类视力中的噪音,腐败和分布变化下实现强大的感知,这是现代神经网络仍然缺乏的领域。我们介绍了Vars,来自复发性稀疏重建的视觉关注,这是一种基于人类视觉注意机制的两个突出特征的新注意力公式:复发和稀疏性。通过神经元之间的复发连接将相关特征分组在一起,而显着物体通过稀疏正则化出现。 VARS采用具有复发连接的吸引子网络,随着时间的流逝,它会收敛到稳定的模式。网络层表示为普通的微分方程(ODE),将注意力作为一个经常性吸引子网络表示,该网络等效地使用编码基本数据模式的“模板”字典来优化输入的稀疏重建。我们表明,自我注意力是具有单步优化的VAR的特殊情况,没有稀疏性约束。 VAR可以很容易地用作替代流行视觉变形金刚的自我发挥作用,从而不断提高其在各种基准测试中的稳健性。代码在GitHub(https://github.com/bfshi/vars)上发布。

Visual attention helps achieve robust perception under noise, corruption, and distribution shifts in human vision, which are areas where modern neural networks still fall short. We present VARS, Visual Attention from Recurrent Sparse reconstruction, a new attention formulation built on two prominent features of the human visual attention mechanism: recurrency and sparsity. Related features are grouped together via recurrent connections between neurons, with salient objects emerging via sparse regularization. VARS adopts an attractor network with recurrent connections that converges toward a stable pattern over time. Network layers are represented as ordinary differential equations (ODEs), formulating attention as a recurrent attractor network that equivalently optimizes the sparse reconstruction of input using a dictionary of "templates" encoding underlying patterns of data. We show that self-attention is a special case of VARS with a single-step optimization and no sparsity constraint. VARS can be readily used as a replacement for self-attention in popular vision transformers, consistently improving their robustness across various benchmarks. Code is released on GitHub (https://github.com/bfshi/VARS).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源