基于骨架的动作识别的解开和统一的图形卷积

论文标题

基于骨架的动作识别的解开和统一的图形卷积

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

论文作者

Liu, Ziyu, Zhang, Hongwen, Chen, Zhenghao, Wang, Zhiyong, Ouyang, Wanli

论文摘要

基于骨架的动作识别算法已广泛使用时空图来建模人类的动力学。为了从这些图中捕获强大的运动模式，远程和多尺度上下文集合和时空依赖性建模是强大特征提取器的关键方面。但是，现有方法在实现多尺度运算符和（2）捕获复杂的时空依赖性方面的毫无疑问的跨天空信息流程中存在局限性（1）无偏长远程关系建模。在这项工作中，我们提供了（1）一种简单的方法，可以解散多尺度图卷积，以及（2）统一的空间 - 暂时图卷积运算符，名为G3D。所提出的多尺度聚合方案将节点在不同社区中对有效的远程建模的重要性。所提出的G3D模块利用密集的跨空间边缘作为跳过连接，以在整个时空图上进行直接信息传播。通过耦合这些建议，我们开发了一个名为MS-G3D的功能提取器，我们的模型在三个大规模数据集上优于先前最先进的方法：NTU RGB+D 60，NTU RGB+D 120，和动力学Skeleton 400。

Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics. To capture robust movement patterns from these graphs, long-range and multi-scale context aggregation and spatial-temporal dependency modeling are critical aspects of a powerful feature extractor. However, existing methods have limitations in achieving (1) unbiased long-range joint relationship modeling under multi-scale operators and (2) unobstructed cross-spacetime information flow for capturing complex spatial-temporal dependencies. In this work, we present (1) a simple method to disentangle multi-scale graph convolutions and (2) a unified spatial-temporal graph convolutional operator named G3D. The proposed multi-scale aggregation scheme disentangles the importance of nodes in different neighborhoods for effective long-range modeling. The proposed G3D module leverages dense cross-spacetime edges as skip connections for direct information propagation across the spatial-temporal graph. By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400.

下载PDF全文

下载文献需遵守相关版权规定

论文标题