论文标题
自我监督的视频表示学习与级联积极检索
Self-supervised Video Representation Learning with Cascade Positive Retrieval
论文作者
论文摘要
自我监督的视频表示学习已被证明可以有效地改善下游任务,例如视频检索和动作识别。在本文中,我们介绍了级联积极检索(CPR),该检索连续地挖掘出了阳性示例W.R.T.在一系列阶段中进行对比学习的查询。具体而言,CPR以不同方式利用了查询示例的多个视图,其中替代视图可能有助于在查询视图中找到另一个积极的示例。我们探讨了消融中可能的CPR配置的影响,包括采矿阶段的数量,每个阶段的顶级相似示例选择比以及渐进式培训,并具有最终的TOP-K选择数量。总体采矿质量是为了反映跨培训集类别的召回率。 CPR达到了中位数的矿业召回率为83.3%,表现优于先前的工作5.5%。在实施方面,CPR是借口任务互补的,可以轻松地应用于以前的工作。在评估UCF101上的预告片中,CPR始终改善现有工作,甚至可以在视频检索中获得56.7%和24.4%的最先进的R@1,以及在UCF101和HMDB51上的行动识别中的83.8%和54.8%。该代码可在https://github.com/necla-ml/cpr上找到。
Self-supervised video representation learning has been shown to effectively improve downstream tasks such as video retrieval and action recognition. In this paper, we present the Cascade Positive Retrieval (CPR) that successively mines positive examples w.r.t. the query for contrastive learning in a cascade of stages. Specifically, CPR exploits multiple views of a query example in different modalities, where an alternative view may help find another positive example dissimilar in the query view. We explore the effects of possible CPR configurations in ablations including the number of mining stages, the top similar example selection ratio in each stage, and progressive training with an incremental number of the final Top-k selection. The overall mining quality is measured to reflect the recall across training set classes. CPR reaches a median class mining recall of 83.3%, outperforming previous work by 5.5%. Implementation-wise, CPR is complementary to pretext tasks and can be easily applied to previous work. In the evaluation of pretraining on UCF101, CPR consistently improves existing work and even achieves state-of-the-art R@1 of 56.7% and 24.4% in video retrieval as well as 83.8% and 54.8% in action recognition on UCF101 and HMDB51. The code is available at https://github.com/necla-ml/CPR.