论文标题
使用模型预测性元调理的有效恢复学习
Efficient Recovery Learning using Model Predictive Meta-Reasoning
论文作者
论文摘要
在现实世界条件下运行的挑战是由于执行错误和状态不确定性引起的各种失败的可能性。在相对良性的环境中,可以通过重试或执行少量手工恢复策略之一来克服这种失败。相比之下,诸如打开门和组装家具之类的接触式连续操作任务不适合详尽的手工设计。为了解决这个问题,我们提出了一种以样本效率的方式来核对操纵策略的一般方法。我们的方法通过在模拟中探索发现当前策略的故障模式来逐步提高鲁棒性,然后学习其他恢复技能来处理这些失败。为了确保有效的学习,我们提出了一种称为Meta-Reounsing用于技能学习的在线算法(Metareskill),该算法可以监视培训期间所有恢复政策的进步,并将培训资源分配给可能会最大程度提高任务绩效的恢复。我们使用我们的方法来学习开门的恢复技能,并在模拟和实际机器人中进行微调的真实机器人进行评估。与开环执行相比,我们的实验表明,即使是有限的恢复学习也可以将任务成功从模拟的71%提高到92.4%,从75%到90%,而对真正的机器人。
Operating under real world conditions is challenging due to the possibility of a wide range of failures induced by execution errors and state uncertainty. In relatively benign settings, such failures can be overcome by retrying or executing one of a small number of hand-engineered recovery strategies. By contrast, contact-rich sequential manipulation tasks, like opening doors and assembling furniture, are not amenable to exhaustive hand-engineering. To address this issue, we present a general approach for robustifying manipulation strategies in a sample-efficient manner. Our approach incrementally improves robustness by first discovering the failure modes of the current strategy via exploration in simulation and then learning additional recovery skills to handle these failures. To ensure efficient learning, we propose an online algorithm called Meta-Reasoning for Skill Learning (MetaReSkill) that monitors the progress of all recovery policies during training and allocates training resources to recoveries that are likely to improve the task performance the most. We use our approach to learn recovery skills for door-opening and evaluate them both in simulation and on a real robot with little fine-tuning. Compared to open-loop execution, our experiments show that even a limited amount of recovery learning improves task success substantially from 71% to 92.4% in simulation and from 75% to 90% on a real robot.