生成动作说明基于骨架的动作识别提示

论文标题

生成动作说明基于骨架的动作识别提示

Generative Action Description Prompts for Skeleton-based Action Recognition

论文作者

Xiang, Wangmeng, Li, Chao, Zhou, Yuxuan, Wang, Biao, Zhang, Lei

论文摘要

基于骨架的动作识别最近受到了广泛关注。当前基于骨架的动作识别的方法通常被表达为单次分类任务，并且不能完全利用动作之间的语义关系。例如，“胜利标志”和“拇指”是手势的两个动作，其主要区别在于手的运动。此信息是从分类的一式式行动类别编码中不可知的，但可以从动作描述中揭示。因此，在培训中利用行动描述可能会受益于代表性学习。在这项工作中，我们为基于骨架的动作识别提出了一种生成动作描述提示（GAP）方法。更具体地说，我们采用预先训练的大规模语言模型作为知识引擎来自动为动作的身体零件动作生成文本说明，并通过利用文本编码来生成不同身体部件的特征向量，并建议使用不同的身体部件并监督Skeleton Encoder进行动作表示。实验表明，我们提出的GAP方法在没有推理时没有额外的计算成本的情况下，对各种基线模型实现了明显的改进。 GAP在基于流行的骨架的动作识别基准上实现了新的最新技术，包括NTU RGB+D，NTU RGB+D 120和NW-UCLA。源代码可在https://github.com/martinxm/gap上找到。

Skeleton-based action recognition has recently received considerable attention. Current approaches to skeleton-based action recognition are typically formulated as one-hot classification tasks and do not fully exploit the semantic relations between actions. For example, "make victory sign" and "thumb up" are two actions of hand gestures, whose major difference lies in the movement of hands. This information is agnostic from the categorical one-hot encoding of action classes but could be unveiled from the action description. Therefore, utilizing action description in training could potentially benefit representation learning. In this work, we propose a Generative Action-description Prompts (GAP) approach for skeleton-based action recognition. More specifically, we employ a pre-trained large-scale language model as the knowledge engine to automatically generate text descriptions for body parts movements of actions, and propose a multi-modal training scheme by utilizing the text encoder to generate feature vectors for different body parts and supervise the skeleton encoder for action representation learning. Experiments show that our proposed GAP method achieves noticeable improvements over various baseline models without extra computation cost at inference. GAP achieves new state-of-the-arts on popular skeleton-based action recognition benchmarks, including NTU RGB+D, NTU RGB+D 120 and NW-UCLA. The source code is available at https://github.com/MartinXM/GAP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题