寻找具有稀疏到密集框架的动作管

论文标题

寻找具有稀疏到密集框架的动作管

Finding Action Tubes with a Sparse-to-Dense Framework

论文作者

Li, Yuxi, Lin, Weiyao, Wang, Tao, See, John, Qian, Rui, Xu, Ning, Wang, Limin, Xu, Shugong

论文摘要

空间行动检测的任务吸引了研究人员越来越多的关注。现有的主导方法通过依靠短期信息和每个单独框架或剪辑的串行串行检测来解决此问题。尽管它们有效，但这些方法表明使用长期信息不足，并且易于效率低下。在本文中，我们首次提出了一个有效的框架，该框架以稀疏到密集的方式从单个前进传球中生成动作管提案。在此框架中，有两个关键特征：（1）在我们的时空网络中明确使用了长期和短期采样信息，（2）一个新的动态特征采样模块（DTS）旨在有效地近似于管输出，同时保持系统可牵引。我们评估了模型在UCF101-24，JHMDB-21和UCFSPORTS基准数据集上的功效，从而实现了与最新方法竞争的有希望的结果。拟议的稀疏到密集的策略使我们的框架效率是最近的竞争对手的效率约7.6倍。

The task of spatial-temporal action detection has attracted increasing attention among researchers. Existing dominant methods solve this problem by relying on short-term information and dense serial-wise detection on each individual frames or clips. Despite their effectiveness, these methods showed inadequate use of long-term information and are prone to inefficiency. In this paper, we propose for the first time, an efficient framework that generates action tube proposals from video streams with a single forward pass in a sparse-to-dense manner. There are two key characteristics in this framework: (1) Both long-term and short-term sampled information are explicitly utilized in our spatiotemporal network, (2) A new dynamic feature sampling module (DTS) is designed to effectively approximate the tube output while keeping the system tractable. We evaluate the efficacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets, achieving promising results that are competitive to state-of-the-art methods. The proposed sparse-to-dense strategy rendered our framework about 7.6 times more efficient than the nearest competitor.

下载PDF全文

下载文献需遵守相关版权规定

论文标题