论文标题
基于骨架的动作表示学习的分层对比度学习
Hierarchical Contrast for Unsupervised Skeleton-based Action Representation Learning
论文作者
论文摘要
本文针对基于骨架的动作表示学习,并提出了一个新的分层对比(HICO)框架。与现有的基于对比度的解决方案不同,该解决方案通常代表实例级特征的输入骨架序列并整体上对比度,我们提出的HICO代表输入到多级特征中,并以层次结构的方式执行对比度。具体而言,给定人体骨架序列,我们通过序列到序列(S2S)编码器和统一的下采样调制仪来表示从时间和空间域的多个特征向量。此外,层次对比度是以四个级别进行的:实例级别,域级别,剪辑级别和零件级别。此外,HICO与S2S编码器是正交的,这使我们能够灵活包含最新的S2S编码器。在四个数据集上进行了广泛的实验,即NTU-60,NTU-1220,PKU-MMD I和II,表明HICO在两个下下游任务中基于无监督的骨骼动作表示学习实现了一种新的最先进,包括行动识别和检索,以及其学识渊博的动作表征,以及其良好的转移能力。此外,我们还表明,我们的框架对基于半监督骨架的动作识别有效。我们的代码可在https://github.com/huiguanlab/hico上找到。
This paper targets unsupervised skeleton-based action representation learning and proposes a new Hierarchical Contrast (HiCo) framework. Different from the existing contrastive-based solutions that typically represent an input skeleton sequence into instance-level features and perform contrast holistically, our proposed HiCo represents the input into multiple-level features and performs contrast in a hierarchical manner. Specifically, given a human skeleton sequence, we represent it into multiple feature vectors of different granularities from both temporal and spatial domains via sequence-to-sequence (S2S) encoders and unified downsampling modules. Besides, the hierarchical contrast is conducted in terms of four levels: instance level, domain level, clip level, and part level. Moreover, HiCo is orthogonal to the S2S encoder, which allows us to flexibly embrace state-of-the-art S2S encoders. Extensive experiments on four datasets, i.e., NTU-60, NTU-120, PKU-MMD I and II, show that HiCo achieves a new state-of-the-art for unsupervised skeleton-based action representation learning in two downstream tasks including action recognition and retrieval, and its learned action representation is of good transferability. Besides, we also show that our framework is effective for semi-supervised skeleton-based action recognition. Our code is available at https://github.com/HuiGuanLab/HiCo.