深入视频动作识别的全面研究

论文标题

深入视频动作识别的全面研究

A Comprehensive Study of Deep Video Action Recognition

论文作者

Zhu, Yi, Li, Xinyu, Liu, Chunhui, Zolfaghari, Mohammadreza, Xiong, Yuanjun, Wu, Chongruo, Zhang, Zhi, Tighe, Joseph, Manmatha, R., Li, Mu

论文摘要

视频动作识别是视频理解的代表性任务之一。在过去的十年中，由于深度学习的出现，我们目睹了视频动作识别的巨大进步。但是，我们还遇到了新的挑战，包括在视频中对远程时间信息进行建模，高计算成本以及由于数据集和评估协议方差而无与伦比的结果。在本文中，我们对200多篇有关视频行动识别的深度学习的现有论文进行了全面调查。我们首先介绍了影响模型设计的17个视频识别数据集。然后，我们按时间顺序呈现视频动作识别模型：从早期尝试调整深度学习的尝试开始，然后是两流网络，然后采用3D卷积内核，最后是最近的计算效率模型。此外，我们在几个代表性数据集和发布代码上基准了流行方法，以获得可重复性。最后，我们讨论了开放问题，并阐明了视频行动识别的机会，以促进新的研究思想。

Video action recognition is one of the representative tasks for video understanding. Over the last decade, we have witnessed great advancements in video action recognition thanks to the emergence of deep learning. But we also encountered new challenges, including modeling long-range temporal information in videos, high computation costs, and incomparable results due to datasets and evaluation protocol variances. In this paper, we provide a comprehensive survey of over 200 existing papers on deep learning for video action recognition. We first introduce the 17 video action recognition datasets that influenced the design of models. Then we present video action recognition models in chronological order: starting with early attempts at adapting deep learning, then to the two-stream networks, followed by the adoption of 3D convolutional kernels, and finally to the recent compute-efficient models. In addition, we benchmark popular methods on several representative datasets and release code for reproducibility. In the end, we discuss open problems and shed light on opportunities for video action recognition to facilitate new research ideas.

下载PDF全文

下载文献需遵守相关版权规定

论文标题