时空自我注意建模与时间贴片转移以进行动作识别

论文标题

时空自我注意建模与时间贴片转移以进行动作识别

Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition

论文作者

Xiang, Wangmeng, Li, Chao, Wang, Biao, Wei, Xihan, Hua, Xian-Sheng, Zhang, Lei

论文摘要

基于变压器的方法最近在基于2D图像的视力任务上取得了巨大进步。但是，对于基于3D视频的任务，例如动作识别，直接将时空变压器应用于视频数据将带来重大计算和记忆负担，因为斑块数量的增加以及自我发场计算的二次复杂性。如何有效，有效地对3D自我注意的视频数据进行建模一直是变形金刚的巨大挑战。在本文中，我们提出了一种时间贴片转移（TPS）方法，用于在变压器中有效的3D自发明建模，以进行基于视频的动作识别。 TPS在时间尺寸中以特定的镶嵌模式移动斑块的一部分，从而将香草的空间自我发注意操作转换为时空的一部分，几乎没有额外的成本。结果，我们可以使用几乎相同的计算和记忆成本来计算3D自我注意力。 TPS是一个插件模块，可以插入现有的2D变压器模型中，以增强时空特征学习。所提出的方法可以通过最先进的V1和V1，潜水-48和Kinetics400实现竞争性能，同时在计算和内存成本方面效率更高。 TPS的源代码可在https://github.com/martinxm/tps上找到。

Transformer-based methods have recently achieved great advancement on 2D image-based vision tasks. For 3D video-based tasks such as action recognition, however, directly applying spatiotemporal transformers on video data will bring heavy computation and memory burdens due to the largely increased number of patches and the quadratic complexity of self-attention computation. How to efficiently and effectively model the 3D self-attention of video data has been a great challenge for transformers. In this paper, we propose a Temporal Patch Shift (TPS) method for efficient 3D self-attention modeling in transformers for video-based action recognition. TPS shifts part of patches with a specific mosaic pattern in the temporal dimension, thus converting a vanilla spatial self-attention operation to a spatiotemporal one with little additional cost. As a result, we can compute 3D self-attention using nearly the same computation and memory cost as 2D self-attention. TPS is a plug-and-play module and can be inserted into existing 2D transformer models to enhance spatiotemporal feature learning. The proposed method achieves competitive performance with state-of-the-arts on Something-something V1 & V2, Diving-48, and Kinetics400 while being much more efficient on computation and memory cost. The source code of TPS can be found at https://github.com/MartinXM/TPS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题