指导性不确定性感知政策优化：结合学习和基于模型的样本策略学习策略

论文标题

指导性不确定性感知政策优化：结合学习和基于模型的样本策略学习策略

Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

论文作者

Lee, Michelle A., Florensa, Carlos, Tremblay, Jonathan, Ratliff, Nathan, Garg, Animesh, Ramos, Fabio, Fox, Dieter

论文摘要

传统的机器人方法依赖于环境的准确模型，详细描述了如何执行任务的详细描述以及坚固的感知系统以跟踪当前状态。另一方面，增强学习方法可以直接从具有奖励信号的原始感觉输入中运行，以描述任务，但样品中非常脆弱。在这项工作中，我们将基于模型方法的优势与基于学习的方法的灵活性相结合，以获取能够克服机器人感知/驱动管道中不准确性的一般方法，同时需要与环境的最小相互作用。这是通过利用不确定性估计来分配给定模型策略可靠的区域的空间以及可能存在缺陷或不当定义的区域来实现的。在这些不确定的区域中，我们表明可以直接与原始感觉输入一起使用本地学习的政策。我们在执行PEG插入的现实世界机器人上测试我们的算法，指导不确定性 - 感知政策优化（GUAPO）。视频可在https://sites.google.com/view/guapo-rl上找到

Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline, while requiring minimal interactions with the environment. This is achieved by leveraging uncertainty estimates to divide the space in regions where the given model-based policy is reliable, and regions where it may have flaws or not be well defined. In these uncertain regions, we show that a locally learned-policy can be used directly with raw sensory inputs. We test our algorithm, Guided Uncertainty-Aware Policy Optimization (GUAPO), on a real-world robot performing peg insertion. Videos are available at https://sites.google.com/view/guapo-rl

下载PDF全文

下载文献需遵守相关版权规定

论文标题