论文标题
3D动态场景图:可操作的空间感知与地方,对象和人类
3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans
论文作者
论文摘要
我们为可行的空间感知提供了统一的表示:3D动态场景图。场景图是指向图表,其中节点代表场景中的实体(例如对象,墙壁,房间),而边缘表示节点之间的关系(例如包含,邻接)。动态场景图(DSG)扩展了此概念,以表示具有移动代理(例如人类,机器人)的动态场景,并包括支持计划和决策的可行信息(例如,时空关系,不同抽象级别的拓扑)。我们的第二个贡献是提供第一个全自动空间感知引擎(SPIN),以从视觉惯性数据中构建DSG。我们整合了针对物体,人类检测和姿势估计的最新技术,并描述了如何在拥挤的场景中鲁棒推断对象,机器人和人类节点。据我们所知,这是第一篇论文,可调和视觉惯性猛击和密集的人类网格跟踪。此外,我们提供算法以获得室内环境(例如地点,结构,房间)及其关系的层次结构表示。我们的第三个贡献是在基于光的统一模拟器中证明拟议的空间感知引擎,我们在其中评估了它的稳健性和表现力。最后,我们讨论了我们的提案对现代机器人技术应用的含义。 3D动态场景图可以对计划和决策,人类机器人互动,长期自治和场景预测产生深远的影响。可以从https://youtu.be/swbofjhypzi获得视频摘要
We present a unified representation for actionable spatial perception: 3D Dynamic Scene Graphs. Scene graphs are directed graphs where nodes represent entities in the scene (e.g. objects, walls, rooms), and edges represent relations (e.g. inclusion, adjacency) among nodes. Dynamic scene graphs (DSGs) extend this notion to represent dynamic scenes with moving agents (e.g. humans, robots), and to include actionable information that supports planning and decision-making (e.g. spatio-temporal relations, topology at different levels of abstraction). Our second contribution is to provide the first fully automatic Spatial PerceptIon eNgine(SPIN) to build a DSG from visual-inertial data. We integrate state-of-the-art techniques for object and human detection and pose estimation, and we describe how to robustly infer object, robot, and human nodes in crowded scenes. To the best of our knowledge, this is the first paper that reconciles visual-inertial SLAM and dense human mesh tracking. Moreover, we provide algorithms to obtain hierarchical representations of indoor environments (e.g. places, structures, rooms) and their relations. Our third contribution is to demonstrate the proposed spatial perception engine in a photo-realistic Unity-based simulator, where we assess its robustness and expressiveness. Finally, we discuss the implications of our proposal on modern robotics applications. 3D Dynamic Scene Graphs can have a profound impact on planning and decision-making, human-robot interaction, long-term autonomy, and scene prediction. A video abstract is available at https://youtu.be/SWbofjhyPzI