积极学习的稀疏半监督行动识别

论文标题

积极学习的稀疏半监督行动识别

Sparse Semi-Supervised Action Recognition with Active Learning

论文作者

Li, Jingyuan, Shlizerman, Eli

论文摘要

基于骨架的动作识别的当前最新方法是监督并依赖标签的。由于注释和标记数据错误的挑战，依赖限制了性能。已经引入了无监督的方法，但是它们将序列组织到簇中，并且仍然需要标签将簇与动作相关联。在本文中，我们提出了一种基于骨架的动作识别的新方法，即Sesar，它连接了这些方法。 Sesar利用未标记的数据和少数积极选择用于标签的序列的信息，将无监督培训与稀疏监督指导相结合。 SESAR由两个主要组件组成，其中第一个组件通过编码器解码器RNN学习未标记的动作序列的潜在表示，该序列重建序列，而第二个组件执行活动学习以选择基于群集和分类的序列进行选择序列，以标记。当两个组件同时在基于骨架的动作序列上训练时，它们对应于仅使用少数标记样品的稳健系统来进行动作识别。我们通过多个序列和动作（例如NW UCLA，NTU RGB+D 60和UWA3D）评估我们的系统。我们的结果超过了基于独立骨架的监督，无监督的，没有集群识别，并且在应用于稀疏标记的样品（低至1％的数据的1％）时，使用了积极学习的方法来识别行动识别方法。

Current state-of-the-art methods for skeleton-based action recognition are supervised and rely on labels. The reliance is limiting the performance due to the challenges involved in annotation and mislabeled data. Unsupervised methods have been introduced, however, they organize sequences into clusters and still require labels to associate clusters with actions. In this paper, we propose a novel approach for skeleton-based action recognition, called SESAR, that connects these approaches. SESAR leverages the information from both unlabeled data and a handful of sequences actively selected for labeling, combining unsupervised training with sparsely supervised guidance. SESAR is composed of two main components, where the first component learns a latent representation for unlabeled action sequences through an Encoder-Decoder RNN which reconstructs the sequences, and the second component performs active learning to select sequences to be labeled based on cluster and classification uncertainty. When the two components are simultaneously trained on skeleton-based action sequences, they correspond to a robust system for action recognition with only a handful of labeled samples. We evaluate our system on common datasets with multiple sequences and actions, such as NW UCLA, NTU RGB+D 60, and UWA3D. Our results outperform standalone skeleton-based supervised, unsupervised with cluster identification, and active-learning methods for action recognition when applied to sparse labeled samples, as low as 1% of the data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题