论文标题
在带宽有限环境中,基于共同动态视觉探索的主动奖励学习
Active Reward Learning for Co-Robotic Vision Based Exploration in Bandwidth Limited Environments
论文作者
论文摘要
我们为机器人提出了一种新颖的POMDP问题制定,必须自主决定在哪里收集新的且科学相关的图像,因为与其与人类运营商进行交流的能力有限。通过这种表述,我们为观察模型,奖励模型和这种机器人的沟通策略得出了限制和设计原理,探索技术来处理非常高维的观察空间以及相关培训数据的稀缺性。我们介绍了一种基于查询以帮助机器人最大程度地减少“遗憾”在线路径的新型主动奖励学习策略,并通过模拟对其进行自主视觉探索的适用性进行评估。我们证明,在某些带宽限制的环境中,这一基于遗憾的标准使机器人探险家能够比下一个最佳标准收集每次任务的奖励高达17%。
We present a novel POMDP problem formulation for a robot that must autonomously decide where to go to collect new and scientifically relevant images given a limited ability to communicate with its human operator. From this formulation we derive constraints and design principles for the observation model, reward model, and communication strategy of such a robot, exploring techniques to deal with the very high-dimensional observation space and scarcity of relevant training data. We introduce a novel active reward learning strategy based on making queries to help the robot minimize path "regret" online, and evaluate it for suitability in autonomous visual exploration through simulations. We demonstrate that, in some bandwidth-limited environments, this novel regret-based criterion enables the robotic explorer to collect up to 17% more reward per mission than the next-best criterion.