从野外视频中学习：以对象为中心的方法

论文标题

从野外视频中学习：以对象为中心的方法

Representation learning from videos in-the-wild: An object-centric approach

论文作者

Romijnders, Rob, Mahendran, Aravindh, Tschannen, Michael, Djolonga, Josip, Ritter, Marvin, Houlsby, Neil, Lucic, Mario

论文摘要

我们提出了一种从未经切割的视频中学习图像表示形式的方法。我们结合了来自现成的对象探测器的监督损失和自我监督的损失，这些损失自然是由每个视频中存在的视频射击框架层次结构引起的。我们报告了关于视觉任务适应基准（VTAB）的19个转移学习任务以及8个分发一般化任务的竞争结果，并讨论了拟议方法的益处和缺点。特别是，它在所有18/19的几个少数学习任务和8/8淘汰的概括任务上的基线改进。最后，我们进行了几项消融研究，并分析了预处理的对象检测器对这套任务套件的性能的影响。

We propose a method to learn image representations from uncurated videos. We combine a supervised loss from off-the-shelf object detectors and self-supervised losses which naturally arise from the video-shot-frame-object hierarchy present in each video. We report competitive results on 19 transfer learning tasks of the Visual Task Adaptation Benchmark (VTAB), and on 8 out-of-distribution-generalization tasks, and discuss the benefits and shortcomings of the proposed approach. In particular, it improves over the baseline on all 18/19 few-shot learning tasks and 8/8 out-of-distribution generalization tasks. Finally, we perform several ablation studies and analyze the impact of the pretrained object detector on the performance across this suite of tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题