通过空间和颞变压器网络基于骨架的动作识别

论文标题

通过空间和颞变压器网络基于骨架的动作识别

Skeleton-based Action Recognition via Spatial and Temporal Transformer Networks

论文作者

Plizzari, Chiara, Cannici, Marco, Matteucci, Matteo

论文摘要

近年来，基于骨架的人类活动识别已经引起了极大的兴趣，因为骨骼数据表明对照明变化，身体尺度，动态摄像头视图和复杂背景是强大的。特别是，空间图卷积网络（ST-GCN）证明在学习空间和时间依赖性对非欧盟数据（例如骨架图）方面具有有效性。然而，对3D骨骼基础的潜在信息的有效编码仍然是一个空旷的问题，尤其是在从关节运动模式及其相关性中提取有效信息时。在这项工作中，我们提出了一种新型的时空变压器网络（ST-TR），该网络使用变压器自我发项操作员在关节之间进行依赖。在我们的ST-TR模型中，空间自我发项模块（SSA）用于理解不同身体部位之间的框架内相互作用，以及时间自我发项模块（TSA）以模拟框架间相关性。这两个组合在一个两流网络中，该网络在三个大规模数据集（NTU-RGB+D 60，NTU-RGB+D 120和动力学骨架400）上进行了评估，并始终改善主链结果。与使用相同输入数据的方法相比，建议的ST-TR在使用关节的坐标作为输入时就可以在所有数据集上实现最先进的性能，并在添加骨骼信息时与最新的ART结果相比。

Skeleton-based Human Activity Recognition has achieved great interest in recent years as skeleton data has demonstrated being robust to illumination changes, body scales, dynamic camera views, and complex background. In particular, Spatial-Temporal Graph Convolutional Networks (ST-GCN) demonstrated to be effective in learning both spatial and temporal dependencies on non-Euclidean data such as skeleton graphs. Nevertheless, an effective encoding of the latent information underlying the 3D skeleton is still an open problem, especially when it comes to extracting effective information from joint motion patterns and their correlations. In this work, we propose a novel Spatial-Temporal Transformer network (ST-TR) which models dependencies between joints using the Transformer self-attention operator. In our ST-TR model, a Spatial Self-Attention module (SSA) is used to understand intra-frame interactions between different body parts, and a Temporal Self-Attention module (TSA) to model inter-frame correlations. The two are combined in a two-stream network, whose performance is evaluated on three large-scale datasets, NTU-RGB+D 60, NTU-RGB+D 120, and Kinetics Skeleton 400, consistently improving backbone results. Compared with methods that use the same input data, the proposed ST-TR achieves state-of-the-art performance on all datasets when using joints' coordinates as input, and results on-par with state-of-the-art when adding bones information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题