3DFCNN：使用具有原始深度信息的3D深神经网络的实时操作识别

论文标题

3DFCNN：使用具有原始深度信息的3D深神经网络的实时操作识别

3DFCNN: Real-Time Action Recognition using 3D Deep Neural Networks with Raw Depth Information

论文作者

Sanchez-Caballero, Adrian, de López-Diz, Sergio, Fuentes-Jimenez, David, Losada-Gutiérrez, Cristina, Marrón-Romera, Marta, Casillas-Perez, David, Sarker, Mohammad Ibrahim

论文摘要

人为的行动识别是人造视觉中的一项基本任务，由于其在不同领域的多次应用，近年来近年来非常重要。％，例如研究人类行为，安全或视频监视。在这种情况下，本文介绍了一种通过RGB-D摄像机提供的原始深度图像序列中实时人类动作识别的方法。该提案基于一个名为3DFCNN的3D完全卷积神经网络，该网络将自动编码来自没有％昂贵的预处理的深度序列的时空模式。此外，描述的3D-CNN允许从深度序列的空间和时间编码信息中提取％自动特征提取和动作分类。深度数据的使用确保了保护人们的隐私％的行动识别，可以识别人们所采取的行动，保护其隐私％\ sout {其中的}，因为从这些数据中无法识别其身份。％\ st {来自深度图像。} 3DFCNN及其结果与三种广泛使用的％大规模NTU RGB+D数据集中的其他最先进方法相比，其结果与具有不同的特征（分辨率，传感器类型，视图，相机位置等）相比。获得的结果允许验证该提案，得出的结论是，它的表现优于基于经典计算机视觉技术的几种最新方法。此外，它可以实现与基于深度学习的最先进方法相媲美的动作识别精度，其计算成本较低，从而可以在实时应用中使用。

Human actions recognition is a fundamental task in artificial vision, that has earned a great importance in recent years due to its multiple applications in different areas. %, such as the study of human behavior, security or video surveillance. In this context, this paper describes an approach for real-time human action recognition from raw depth image-sequences, provided by an RGB-D camera. The proposal is based on a 3D fully convolutional neural network, named 3DFCNN, which automatically encodes spatio-temporal patterns from depth sequences without %any costly pre-processing. Furthermore, the described 3D-CNN allows %automatic features extraction and actions classification from the spatial and temporal encoded information of depth sequences. The use of depth data ensures that action recognition is carried out protecting people's privacy% allows recognizing the actions carried out by people, protecting their privacy%\sout{of them} , since their identities can not be recognized from these data. %\st{ from depth images.} 3DFCNN has been evaluated and its results compared to those from other state-of-the-art methods within three widely used %large-scale NTU RGB+D datasets, with different characteristics (resolution, sensor type, number of views, camera location, etc.). The obtained results allows validating the proposal, concluding that it outperforms several state-of-the-art approaches based on classical computer vision techniques. Furthermore, it achieves action recognition accuracy comparable to deep learning based state-of-the-art methods with a lower computational cost, which allows its use in real-time applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题