论文标题
ASM-LOC:针对弱监督的时间动作定位的动作感知段建模
ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
论文作者
论文摘要
弱监督的时间动作本地化旨在识别和本地化的动作片段,仅给出一个用于培训的视频级动作标签,在未修剪的视频中。没有动作段的边界信息,现有方法主要依赖于多个实例学习(MIL),其中未标记实例(即视频片段)的预测是通过对标记的袋子(即未修剪视频)进行分类来监督的。但是,这种公式通常将视频中的片段视为独立的实例,忽略了动作段内和整个动作段内的基本时间结构。为了解决这个问题,我们提出了\ System,这是一个新型的WTAL框架,可实现超出基于标准MIL的方法的明确,感知的段模型。我们的框架需要三个以段为中心的组件:(i)为简短动作的贡献而动态段采样; (ii)用于建模动作动力学和捕获时间依赖性的段内和分段关注; (iii)伪实例级别的监督,以改善动作边界预测。此外,提出了一种多步进策略,以逐步改善模型训练过程的行动建议。关于Thumos-14和ActivityNet-V1.3的广泛实验证明了我们方法的有效性,并在两个数据集上建立了新的最新技术。代码和模型可在〜\ url {https://github.com/boheumd/asm-loc}上公开获得。
Weakly-supervised temporal action localization aims to recognize and localize action segments in untrimmed videos given only video-level action labels for training. Without the boundary information of action segments, existing methods mostly rely on multiple instance learning (MIL), where the predictions of unlabeled instances (i.e., video snippets) are supervised by classifying labeled bags (i.e., untrimmed videos). However, this formulation typically treats snippets in a video as independent instances, ignoring the underlying temporal structures within and across action segments. To address this problem, we propose \system, a novel WTAL framework that enables explicit, action-aware segment modeling beyond standard MIL-based methods. Our framework entails three segment-centric components: (i) dynamic segment sampling for compensating the contribution of short actions; (ii) intra- and inter-segment attention for modeling action dynamics and capturing temporal dependencies; (iii) pseudo instance-level supervision for improving action boundary prediction. Furthermore, a multi-step refinement strategy is proposed to progressively improve action proposals along the model training process. Extensive experiments on THUMOS-14 and ActivityNet-v1.3 demonstrate the effectiveness of our approach, establishing new state of the art on both datasets. The code and models are publicly available at~\url{https://github.com/boheumd/ASM-Loc}.