论文标题
重新以自我为中心的视觉
Rescaling Egocentric Vision
论文作者
论文摘要
本文介绍了管道,以扩展以自我为中心视觉的最大数据集,即Epic-Kitchens。这项工作最终达到了Epic-kitchens-100的最终,在700个可变长度视频中的100小时,20m帧,90k动作集合,使用头部安装相机在45个环境中捕获长期无脚本活动。与以前的版本相比,Epic-kitchens-100已使用新型管道进行注释,该管道允许更密集(每分钟的动作增加54%)和对细粒度动作的更完整的注释(+128%的动作段)。该集合可以采取新的挑战,例如行动检测和评估“时间测试” - 即,对2018年收集的数据培训的模型是否可以推广到两年后收集的新素材。该数据集与6个挑战保持一致:行动识别(全面和弱的监督),行动检测,动作预期,跨模式检索(来自标题),以及无监督的域适应行动识别。对于每个挑战,我们定义任务,提供基准和评估指标
This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version, EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection enables new challenges such as action detection and evaluating the "test of time" - i.e. whether models trained on data collected in 2018 can generalise to new footage collected two years later. The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics