使用学习方法对基于3D骨架的动作识别的调查

论文标题

使用学习方法对基于3D骨架的动作识别的调查

A Survey on 3D Skeleton-Based Action Recognition Using Learning Method

论文作者

Ren, Bin, Liu, Mengyuan, Ding, Runwei, Liu, Hong

论文摘要

由于骨架数据提供的固有优势，基于3D骨架的动作识别（3D SAR）在计算机视觉社区中引起了极大的关注。结果，多年来已经进行了大量令人印象深刻的作品，包括基于常规手工特征和学习的特征提取方法的作品。但是，对行动识别的先前调查主要集中在视频或RGB数据主导的方法上，对与骨架数据有关的评论的覆盖率有限。此外，尽管在该领域中广泛应用了深度学习方法，但从深度学习体系结构的角度来看，缺乏显着的研究，提供了介绍性或全面的评论。为了解决这些局限性，这项调查首先强调了行动识别的重要性，并强调了3D骨架数据作为宝贵模式的重要性。随后，我们对基于四个基本深度体系结构的主流动作识别技术进行了全面介绍，即复发性神经网络（RNN），卷积神经网络（CNNS），图形卷积网络（GCN）和变形金刚。然后以数据驱动的方式介绍所有具有相应体系结构的方法，并详细讨论。最后，我们提供了有关当前最大的3D骨架数据集NTU-RGB+D及其新版本NTU-RGB+D 120的见解，以及这些数据集上几种表现最佳算法的概述。据我们所知，这项研究代表了使用3D骨架数据对基于深度学习的行动识别的首次全面讨论。

3D skeleton-based action recognition (3D SAR) has gained significant attention within the computer vision community, owing to the inherent advantages offered by skeleton data. As a result, a plethora of impressive works, including those based on conventional handcrafted features and learned feature extraction methods, have been conducted over the years. However, prior surveys on action recognition have primarily focused on video or RGB data-dominated approaches, with limited coverage of reviews related to skeleton data. Furthermore, despite the extensive application of deep learning methods in this field, there has been a notable absence of research that provides an introductory or comprehensive review from the perspective of deep learning architectures. To address these limitations, this survey first underscores the importance of action recognition and emphasizes the significance of 3D skeleton data as a valuable modality. Subsequently, we provide a comprehensive introduction to mainstream action recognition techniques based on four fundamental deep architectures, i.e., Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), Graph Convolutional Network (GCN), and Transformers. All methods with the corresponding architectures are then presented in a data-driven manner with detailed discussion. Finally, we offer insights into the current largest 3D skeleton dataset, NTU-RGB+D, and its new edition, NTU-RGB+D 120, along with an overview of several top-performing algorithms on these datasets. To the best of our knowledge, this research represents the first comprehensive discussion of deep learning-based action recognition using 3D skeleton data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题