论文标题
多级颞金字塔网络用于动作检测
Multi-Level Temporal Pyramid Network for Action Detection
论文作者
论文摘要
当前,一阶段的框架已被广泛应用于时间动作检测,但它们仍然遭受了挑战,即动作实例跨越了很大的时间。原因是这些单阶段探测器,例如单镜头多框检测器(SSD),提取时间特征仅适用于每个头部的单层层,这不足以进行分类和回归。在本文中,我们提出了一个多级时空金字塔网络(MLTPN),以改善特征的歧视。特别是,我们首先融合来自具有不同时间分辨率的多层的功能,以编码多层时间信息。然后,我们在功能上应用多级特征金字塔体系结构,以增强其判别能力。最后,我们设计了一个简单而有效的功能融合模块,以融合多级多尺度功能。通过这种方式,提出的MLTPN可以在不同持续时间的不同动作实例中学习丰富而判别的特征。我们在两个具有挑战性的数据集上评估了MLTPN:Thumos'14和ActivityNet v1.3,实验结果表明,MLTPN在ActivityNet V1.3上获得了竞争性能,并且优于Thumos'14的最先进方法。
Currently, one-stage frameworks have been widely applied for temporal action detection, but they still suffer from the challenge that the action instances span a wide range of time. The reason is that these one-stage detectors, e.g., Single Shot Multi-Box Detector (SSD), extract temporal features only applying a single-level layer for each head, which is not discriminative enough to perform classification and regression. In this paper, we propose a Multi-Level Temporal Pyramid Network (MLTPN) to improve the discrimination of the features. Specially, we first fuse the features from multiple layers with different temporal resolutions, to encode multi-layer temporal information. We then apply a multi-level feature pyramid architecture on the features to enhance their discriminative abilities. Finally, we design a simple yet effective feature fusion module to fuse the multi-level multi-scale features. By this means, the proposed MLTPN can learn rich and discriminative features for different action instances with different durations. We evaluate MLTPN on two challenging datasets: THUMOS'14 and Activitynet v1.3, and the experimental results show that MLTPN obtains competitive performance on Activitynet v1.3 and outperforms the state-of-the-art approaches on THUMOS'14 significantly.