重新以自我为中心的视觉

论文标题

重新以自我为中心的视觉

Rescaling Egocentric Vision

论文作者

Damen, Dima, Doughty, Hazel, Farinella, Giovanni Maria, Furnari, Antonino, Kazakos, Evangelos, Ma, Jian, Moltisanti, Davide, Munro, Jonathan, Perrett, Toby, Price, Will, Wray, Michael

论文摘要

本文介绍了管道，以扩展以自我为中心视觉的最大数据集，即Epic-Kitchens。这项工作最终达到了Epic-kitchens-100的最终，在700个可变长度视频中的100小时，20m帧，90k动作集合，使用头部安装相机在45个环境中捕获长期无脚本活动。与以前的版本相比，Epic-kitchens-100已使用新型管道进行注释，该管道允许更密集（每分钟的动作增加54％）和对细粒度动作的更完整的注释（+128％的动作段）。该集合可以采取新的挑战，例如行动检测和评估“时间测试” - 即，对2018年收集的数据培训的模型是否可以推广到两年后收集的新素材。该数据集与6个挑战保持一致：行动识别（全面和弱的监督），行动检测，动作预期，跨模式检索（来自标题），以及无监督的域适应行动识别。对于每个挑战，我们定义任务，提供基准和评估指标

This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version, EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection enables new challenges such as action detection and evaluating the "test of time" - i.e. whether models trained on data collected in 2018 can generalise to new footage collected two years later. The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics

下载PDF全文

下载文献需遵守相关版权规定

论文标题