可视觉对象跟踪的可变形暹罗注意网络

论文标题

可视觉对象跟踪的可变形暹罗注意网络

Deformable Siamese Attention Networks for Visual Object Tracking

论文作者

Yu, Yuechen, Xiong, Yilei, Huang, Weilin, Scott, Matthew R.

论文摘要

基于暹罗的跟踪器在视觉对象跟踪方面取得了出色的性能。但是，目标模板未在线更新，目标模板和搜索图像的功能是在暹罗体系结构中独立计算的。在本文中，我们提出了可变形的暹罗注意网络（称为暹罗语），通过引入一种新的暹罗注意机制，该机制计算出可变形的自我注意力和跨注意。自我注意力通过空间注意力学习强大的上下文信息，并有选择地强调与频道注意的相互依存的频道特征。交叉注意力能够在目标模板和搜索图像之间汇总丰富的上下文相互依赖性，从而提供一种隐性的方式来自适应更新目标模板。此外，我们设计了一个区域改进模块，该模块可以在注意特征之间计算深度跨相关性，以进行更准确的跟踪。我们对六个基准进行实验，在该基准测试中，我们的方法实现了新的状态结果，超过了强大的基线，SiamRPN ++ [24]，由0.464--> 0.537和0.415-> 0.415-> 0.470 apot 2016和2018上的EAO。

Siamese-based trackers have achieved excellent performance on visual object tracking. However, the target template is not updated online, and the features of the target template and search image are computed independently in a Siamese architecture. In this paper, we propose Deformable Siamese Attention Networks, referred to as SiamAttn, by introducing a new Siamese attention mechanism that computes deformable self-attention and cross-attention. The self attention learns strong context information via spatial attention, and selectively emphasizes interdependent channel-wise features with channel attention. The cross-attention is capable of aggregating rich contextual inter-dependencies between the target template and the search image, providing an implicit manner to adaptively update the target template. In addition, we design a region refinement module that computes depth-wise cross correlations between the attentional features for more accurate tracking. We conduct experiments on six benchmarks, where our method achieves new state of-the-art results, outperforming the strong baseline, SiamRPN++ [24], by 0.464->0.537 and 0.415->0.470 EAO on VOT 2016 and 2018. Our code is available at: https://github.com/msight-tech/research-siamattn.

下载PDF全文

下载文献需遵守相关版权规定

论文标题