多级颞金字塔网络用于动作检测

论文标题

多级颞金字塔网络用于动作检测

Multi-Level Temporal Pyramid Network for Action Detection

论文作者

Wang, Xiang, Gao, Changxin, Zhang, Shiwei, Sang, Nong

论文摘要

当前，一阶段的框架已被广泛应用于时间动作检测，但它们仍然遭受了挑战，即动作实例跨越了很大的时间。原因是这些单阶段探测器，例如单镜头多框检测器（SSD），提取时间特征仅适用于每个头部的单层层，这不足以进行分类和回归。在本文中，我们提出了一个多级时空金字塔网络（MLTPN），以改善特征的歧视。特别是，我们首先融合来自具有不同时间分辨率的多层的功能，以编码多层时间信息。然后，我们在功能上应用多级特征金字塔体系结构，以增强其判别能力。最后，我们设计了一个简单而有效的功能融合模块，以融合多级多尺度功能。通过这种方式，提出的MLTPN可以在不同持续时间的不同动作实例中学习丰富而判别的特征。我们在两个具有挑战性的数据集上评估了MLTPN：Thumos'14和ActivityNet v1.3，实验结果表明，MLTPN在ActivityNet V1.3上获得了竞争性能，并且优于Thumos'14的最先进方法。

Currently, one-stage frameworks have been widely applied for temporal action detection, but they still suffer from the challenge that the action instances span a wide range of time. The reason is that these one-stage detectors, e.g., Single Shot Multi-Box Detector (SSD), extract temporal features only applying a single-level layer for each head, which is not discriminative enough to perform classification and regression. In this paper, we propose a Multi-Level Temporal Pyramid Network (MLTPN) to improve the discrimination of the features. Specially, we first fuse the features from multiple layers with different temporal resolutions, to encode multi-layer temporal information. We then apply a multi-level feature pyramid architecture on the features to enhance their discriminative abilities. Finally, we design a simple yet effective feature fusion module to fuse the multi-level multi-scale features. By this means, the proposed MLTPN can learn rich and discriminative features for different action instances with different durations. We evaluate MLTPN on two challenging datasets: THUMOS'14 and Activitynet v1.3, and the experimental results show that MLTPN obtains competitive performance on Activitynet v1.3 and outperforms the state-of-the-art approaches on THUMOS'14 significantly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题