PSUMNET：统一的模态部分流是有效的基于姿势的动作识别所需的全部

论文标题

PSUMNET：统一的模态部分流是有效的基于姿势的动作识别所需的全部

PSUMNet: Unified Modality Part Streams are All You Need for Efficient Pose-based Action Recognition

论文作者

Trivedi, Neel, Sarvadevabhatla, Ravi Kiran

论文摘要

基于姿势的动作识别主要是通过以整体方式处理输入骨架的方法来解决的，即姿势树中的关节整体处理。但是，这种方法忽略了这样一个事实，即行动类别通常以局部动力动力为特征，这些动力动力学仅涉及涉及手（例如“竖起拇指”）或腿部（例如``踢''）的零件联合组的小子集。尽管存在基于部分组的方法，但在整体姿势框架内并未考虑每个部分组，从而导致这种方法不足。此外，传统方法采用独立的方式流（例如关节，骨，关节速度，骨速度），并在这些流中多次训练网络，从而大大增加了训练参数的数量。为了解决这些问题，我们介绍了Psumnet，这是一种可扩展有效的基于姿势的动作识别的新型方法。在表示级别，我们提出了一种基于全球框架的零件流方法，而不是基于常规模态流。在每个部分流中，来自多种模式的关联数据被处理管道统一和消耗。在实验上，PSUMNET在广泛使用的NTURGB+D 60/120数据集和密集的关节骨架数据集NTU 60-X/120-X上实现了最先进的性能。 PSUMNET高效，胜过竞争方法，使用100％-400％的参数。 PSUMNET还概括了具有竞争性能的SHREC手势数据集。总体而言，PSUMNET的可伸缩性，性能和效率使其成为行动识别以及在符合限制的嵌入式和边缘设备上部署的有吸引力的选择。可以在https://github.com/skelemoa/psumnet上访问代码和预算模型

Pose-based action recognition is predominantly tackled by approaches which treat the input skeleton in a monolithic fashion, i.e. joints in the pose tree are processed as a whole. However, such approaches ignore the fact that action categories are often characterized by localized action dynamics involving only small subsets of part joint groups involving hands (e.g. `Thumbs up') or legs (e.g. `Kicking'). Although part-grouping based approaches exist, each part group is not considered within the global pose frame, causing such methods to fall short. Further, conventional approaches employ independent modality streams (e.g. joint, bone, joint velocity, bone velocity) and train their network multiple times on these streams, which massively increases the number of training parameters. To address these issues, we introduce PSUMNet, a novel approach for scalable and efficient pose-based action recognition. At the representation level, we propose a global frame based part stream approach as opposed to conventional modality based streams. Within each part stream, the associated data from multiple modalities is unified and consumed by the processing pipeline. Experimentally, PSUMNet achieves state of the art performance on the widely used NTURGB+D 60/120 dataset and dense joint skeleton dataset NTU 60-X/120-X. PSUMNet is highly efficient and outperforms competing methods which use 100%-400% more parameters. PSUMNet also generalizes to the SHREC hand gesture dataset with competitive performance. Overall, PSUMNet's scalability, performance and efficiency makes it an attractive choice for action recognition and for deployment on compute-restricted embedded and edge devices. Code and pretrained models can be accessed at https://github.com/skelemoa/psumnet

下载PDF全文

下载文献需遵守相关版权规定

论文标题