论文标题
共享自治的剩余政策学习
Residual Policy Learning for Shared Autonomy
论文作者
论文摘要
共享的自主权为人类机器人协作提供了一个有效的框架,该框架利用了人类和机器人的互补优势来实现共同的目标。许多现有的共享自治方法提出了限制性的假设,即目标空间,环境动态或人类政策是先验的,或者仅限于离散的行动空间,从而阻止了这些方法扩展到复杂的现实世界环境。我们为共同的自主权提出了一种无模型的剩余政策学习算法,以减轻对这些假设的需求。对我们的代理商进行了训练,以最少调整人类的行为,以便满足一组目标不足的约束。我们在两个连续的控制环境中测试我们的方法:Lunar Lander,一个2D飞行控制域和一个6DOF四极管到达任务。在对人类和替代飞行员的实验中,我们的方法显着改善了任务绩效,而没有任何了解人类目标的限制。这些结果突出了无模型深的强化学习能够实现适合连续控制设置的辅助代理的能力,但对用户意图的了解很少。
Shared autonomy provides an effective framework for human-robot collaboration that takes advantage of the complementary strengths of humans and robots to achieve common goals. Many existing approaches to shared autonomy make restrictive assumptions that the goal space, environment dynamics, or human policy are known a priori, or are limited to discrete action spaces, preventing those methods from scaling to complicated real world environments. We propose a model-free, residual policy learning algorithm for shared autonomy that alleviates the need for these assumptions. Our agents are trained to minimally adjust the human's actions such that a set of goal-agnostic constraints are satisfied. We test our method in two continuous control environments: Lunar Lander, a 2D flight control domain, and a 6-DOF quadrotor reaching task. In experiments with human and surrogate pilots, our method significantly improves task performance without any knowledge of the human's goal beyond the constraints. These results highlight the ability of model-free deep reinforcement learning to realize assistive agents suited to continuous control settings with little knowledge of user intent.