论文标题
记忆启动的密集预测性编码用于视频表示学习
Memory-augmented Dense Predictive Coding for Video Representation Learning
论文作者
论文摘要
本文的目的是从视频中进行自我监督的学习,尤其是用于行动识别的表示。我们做出以下贡献:(i)我们为任务提供了一个新的体系结构和学习框架记忆增强的密集预测编码(MEMDPC)。它经过了一组压缩记忆的预测注意机制的训练,因此,任何未来的状态始终可以通过凝结表示的凸组合来构建,从而可以有效地做出多个假设。 (ii)我们研究了从RGB框架或无监督的光流或两者兼而有之的纯视觉自我监督视频表示学习。 (iii)我们在四个不同的下游任务上彻底评估了学到的代表性的质量:动作识别,视频检索,稀缺注释和无意的行动分类。在所有情况下,我们都在其他方法中证明了最先进或可比的性能,而培训数据的数量级较少。
The objective of this paper is self-supervised learning from video, in particular for representations for action recognition. We make the following contributions: (i) We propose a new architecture and learning framework Memory-augmented Dense Predictive Coding (MemDPC) for the task. It is trained with a predictive attention mechanism over the set of compressed memories, such that any future states can always be constructed by a convex combination of the condense representations, allowing to make multiple hypotheses efficiently. (ii) We investigate visual-only self-supervised video representation learning from RGB frames, or from unsupervised optical flow, or both. (iii) We thoroughly evaluate the quality of learnt representation on four different downstream tasks: action recognition, video retrieval, learning with scarce annotations, and unintentional action classification. In all cases, we demonstrate state-of-the-art or comparable performance over other approaches with orders of magnitude fewer training data.