论文标题
课程的时间对比度学习
Temporal Contrastive Learning with Curriculum
论文作者
论文摘要
我们提出了一种对比的视频表示方法,它使用课程学习在对比度培训中施加动态抽样策略。更具体地说,同意从易于正面样品(时间上接近和语义上相似的剪辑)开始对比度训练,并且随着训练的进行,它会有效地增加时间跨度(有效地采样)硬质阳性(在时间上和语义上不同)。为了学习更好的上下文感知表示形式,我们还提出了一个辅助任务,以预测积极剪辑之间的时间距离。我们对两个流行的动作识别数据集进行了广泛的实验,即UCF101和HMDB51,我们提出的方法在两项视频动作识别和视频检索的基准任务上实现了最新的性能。我们通过使用R(2+1)D和C3D编码器以及对Kinetics-400和Kinetics-200200数据集的R(2+1)D和C3D编码器以及预训练的影响,探讨编码器主链和预训练策略的影响。此外,一项详细的消融研究显示了我们提出的方法的每个组成部分的有效性。
We present ConCur, a contrastive video representation learning method that uses curriculum learning to impose a dynamic sampling strategy in contrastive training. More specifically, ConCur starts the contrastive training with easy positive samples (temporally close and semantically similar clips), and as the training progresses, it increases the temporal span effectively sampling hard positives (temporally away and semantically dissimilar). To learn better context-aware representations, we also propose an auxiliary task of predicting the temporal distance between a positive pair of clips. We conduct extensive experiments on two popular action recognition datasets, UCF101 and HMDB51, on which our proposed method achieves state-of-the-art performance on two benchmark tasks of video action recognition and video retrieval. We explore the impact of encoder backbones and pre-training strategies by using R(2+1)D and C3D encoders and pre-training on Kinetics-400 and Kinetics-200 datasets. Moreover, a detailed ablation study shows the effectiveness of each of the components of our proposed method.