复杂行动识别的通用到特定的框架

论文标题

复杂行动识别的通用到特定的框架

Universal-to-Specific Framework for Complex Action Recognition

论文作者

Zhao, Peisen, Xie, Lingxi, Zhang, Ya, Tian, Qi

论文摘要

基于视频的动作识别最近引起了计算机视觉领域的广泛关注。为了解决更复杂的识别任务，有必要区分不同级别的阶级变化。受到人类决策过程的共同流程图的启发，该过程首先缩小了可能的类别，然后将“重新思考”过程应用于较高级别的识别，我们建议有效的通用到特定的特定于特定的（U2S）框架来进行复杂的行动识别。 U2S框架由三个子网络组成：通用网络，特定于类别的网络和一个掩码网络。通用网络首先学习通用功能表示。然后，掩码网络会根据通用网络的输出来生成通过类别正规化混淆类的注意力掩码。蒙版将进一步用于指导特定于类特异性特征表示的类别特定网络。整个框架以端到端的方式进行了优化。在各种基准数据集上进行的实验，例如，某些东西，UCF101和HMDB51数据集，证明了U2S框架的有效性；即，U2S可以专注于令人困惑的歧视性时空区域。我们进一步可视化不同类别之间的关系，表明U2确实提高了学习特征的可区分性。此外，提出的U2S模型是一个通用框架，可以采用任何基本识别网络。

Video-based action recognition has recently attracted much attention in the field of computer vision. To solve more complex recognition tasks, it has become necessary to distinguish different levels of interclass variations. Inspired by a common flowchart based on the human decision-making process that first narrows down the probable classes and then applies a "rethinking" process for finer-level recognition, we propose an effective universal-to-specific (U2S) framework for complex action recognition. The U2S framework is composed of three subnetworks: a universal network, a category-specific network, and a mask network. The universal network first learns universal feature representations. The mask network then generates attention masks for confusing classes through category regularization based on the output of the universal network. The mask is further used to guide the category-specific network for class-specific feature representations. The entire framework is optimized in an end-to-end manner. Experiments on a variety of benchmark datasets, e.g., the Something-Something, UCF101, and HMDB51 datasets, demonstrate the effectiveness of the U2S framework; i.e., U2S can focus on discriminative spatiotemporal regions for confusing categories. We further visualize the relationship between different classes, showing that U2S indeed improves the discriminability of learned features. Moreover, the proposed U2S model is a general framework and may adopt any base recognition network.

下载PDF全文

下载文献需遵守相关版权规定

论文标题