论文标题
通过多个中层表示,在现实世界环境中基于图像的导航:融合模型,基准和有效评估
Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models, Benchmark and Efficient Evaluation
论文作者
论文摘要
导航复杂的室内环境需要深入了解机器人代理人所采取的空间,以正确地将代理商的导航过程告知目标位置。在最近的基于学习的导航方法中,通过收集模拟中所需的经验,可以同时实现代理商的场景理解和导航能力。不幸的是,即使模拟器代表了训练导航策略的有效工具,当被转移到现实世界中时,最终的模型通常会失败。一种可能的解决方案是为导航模型提供中层视觉表示,其中包含场景的重要域不变属性。但是,有助于将模型转移到现实世界的最佳表示是什么?如何合并它们?在这项工作中,我们通过提出深度学习体系结构的基准来解决这些问题,以结合一系列中级视觉表示,以执行强化学习设置后执行点目标导航任务。所有提出的导航模型均已在合成办公室环境上使用栖息地模拟器进行了培训,并已使用真正的机器人平台在同一现实环境上进行了测试。为了有效地评估其在真实环境中的性能,已经提出了一种验证工具来生成模拟器内的现实导航剧集。我们的实验表明,导航模型可以从多模式输入中受益,并且我们的验证工具可以很好地估算现实世界中预期的导航性能,同时节省时间和资源。获得的环境的合成和真实的3D模型,以及我们在栖息地之上构建的验证工具的代码,可在以下链接上公开可用:https://iplab.dmi.unict.it.it/embodiedvn/
Navigating complex indoor environments requires a deep understanding of the space the robotic agent is acting into to correctly inform the navigation process of the agent towards the goal location. In recent learning-based navigation approaches, the scene understanding and navigation abilities of the agent are achieved simultaneously by collecting the required experience in simulation. Unfortunately, even if simulators represent an efficient tool to train navigation policies, the resulting models often fail when transferred into the real world. One possible solution is to provide the navigation model with mid-level visual representations containing important domain-invariant properties of the scene. But, what are the best representations that facilitate the transfer of a model to the real-world? How can they be combined? In this work we address these issues by proposing a benchmark of Deep Learning architectures to combine a range of mid-level visual representations, to perform a PointGoal navigation task following a Reinforcement Learning setup. All the proposed navigation models have been trained with the Habitat simulator on a synthetic office environment and have been tested on the same real-world environment using a real robotic platform. To efficiently assess their performance in a real context, a validation tool has been proposed to generate realistic navigation episodes inside the simulator. Our experiments showed that navigation models can benefit from the multi-modal input and that our validation tool can provide good estimation of the expected navigation performance in the real world, while saving time and resources. The acquired synthetic and real 3D models of the environment, together with the code of our validation tool built on top of Habitat, are publicly available at the following link: https://iplab.dmi.unict.it/EmbodiedVN/