论文标题

使用Intra Intra对比框架的自我监督视频表示学习

Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework

论文作者

Tao, Li, Wang, Xueting, Yamasaki, Toshihiko

论文摘要

我们提出了一种自我监督的方法,以从视频中学习特征表示。传统自我监管方法中的标准方法使用正面的数据对来训练对比度学习策略。在这种情况下,同一视频的不同模式被视为阳性,不同视频的视频片段被视为负面。由于时空信息对于视频表示很重要,因此我们通过引入阴性样本来扩展负样本,这些样本通过破坏视频剪辑中的时间关系来从相同的锚视频转换。借助提出的Intra对比度(IIC)框架,我们可以训练时空卷积网络以学习视频表示。我们的IIC框架中有许多灵活的选项,我们通过使用几种不同的配置进行实验。使用博学的视频表示对视频检索和视频识别任务进行评估。我们提出的IIC优于当前最新结果的利润率很大,例如UCF101和HMDB51数据集的TOP-1准确性提高了16.7%和9.5%的点,以进行视频检索。为了进行视频识别,还可以在这两个基准数据集中获得改进。代码可在https://github.com/bestjuly/inter-intra-video-contastive-learning上找到。

We propose a self-supervised method to learn feature representations from videos. A standard approach in traditional self-supervised methods uses positive-negative data pairs to train with contrastive learning strategy. In such a case, different modalities of the same video are treated as positives and video clips from a different video are treated as negatives. Because the spatio-temporal information is important for video representation, we extend the negative samples by introducing intra-negative samples, which are transformed from the same anchor video by breaking temporal relations in video clips. With the proposed Inter-Intra Contrastive (IIC) framework, we can train spatio-temporal convolutional networks to learn video representations. There are many flexible options in our IIC framework and we conduct experiments by using several different configurations. Evaluations are conducted on video retrieval and video recognition tasks using the learned video representation. Our proposed IIC outperforms current state-of-the-art results by a large margin, such as 16.7% and 9.5% points improvements in top-1 accuracy on UCF101 and HMDB51 datasets for video retrieval, respectively. For video recognition, improvements can also be obtained on these two benchmark datasets. Code is available at https://github.com/BestJuly/Inter-intra-video-contrastive-learning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源