减轻：在视频中与HOI的图形和分层时间网络学习互动

论文标题

减轻：在视频中与HOI的图形和分层时间网络学习互动

LIGHTEN: Learning Interactions with Graph and Hierarchical TEmporal Networks for HOI in videos

论文作者

Sunkesula, Sai Praneeth Reddy, Dabral, Rishabh, Ramakrishnan, Ganesh

论文摘要

分析人类与视频中对象之间的相互作用包括识别人类与视频中存在的对象之间的关系。可以将其视为视觉关系检测的专门版本，其中其中一个必须是人类。尽管传统方法将问题提出为一系列视频段的推断，但我们提出了一种层次结构方法，即减轻，以学习视觉特征，以有效捕获视频中多个粒度的时空提示。与当前的方法不同，避免使用地面真实数据（例如深度图或3D人类姿势），从而增加了非RGBD数据集的概括。此外，我们仅使用视觉特征而不是常用的手工空间特征实现相同的功能。我们实现了最新的人类对象相互作用检测（88.9％和92.6％）以及CAD-1220的预期任务，并在V-Coco数据集中对基于图像的HOI检测进行竞争结果，为基于视觉特征的方法设定了新的基准。 https://github.com/praneeth11009/lighten-learning-interactions-with-graphs-and-graphs-and-hierarchical-temboral-networks-for-hoi-hoi

Analyzing the interactions between humans and objects from a video includes identification of the relationships between humans and the objects present in the video. It can be thought of as a specialized version of Visual Relationship Detection, wherein one of the objects must be a human. While traditional methods formulate the problem as inference on a sequence of video segments, we present a hierarchical approach, LIGHTEN, to learn visual features to effectively capture spatio-temporal cues at multiple granularities in a video. Unlike current approaches, LIGHTEN avoids using ground truth data like depth maps or 3D human pose, thus increasing generalization across non-RGBD datasets as well. Furthermore, we achieve the same using only the visual features, instead of the commonly used hand-crafted spatial features. We achieve state-of-the-art results in human-object interaction detection (88.9% and 92.6%) and anticipation tasks of CAD-120 and competitive results on image based HOI detection in V-COCO dataset, setting a new benchmark for visual features based approaches. Code for LIGHTEN is available at https://github.com/praneeth11009/LIGHTEN-Learning-Interactions-with-Graphs-and-Hierarchical-TEmporal-Networks-for-HOI

下载PDF全文

下载文献需遵守相关版权规定

论文标题