稳定性保证的强化学习用于接触丰富的操纵

论文标题

稳定性保证的强化学习用于接触丰富的操纵

Stability-Guaranteed Reinforcement Learning for Contact-rich Manipulation

论文作者

Khader, Shahbaz A., Yin, Hang, Falco, Pietro, Kragic, Danica

论文摘要

强化学习（RL）在接触式操纵任务中取得了相当大的成功，但它仍然落后于从机器人控制理论的进步（例如阻抗控制和稳定性保证）中受益。最近，可变阻抗控制（VIC）的概念被采用到RL中，结果令人鼓舞。但是，更重要的稳定性问题仍然没有解决。为了阐明稳定RL的挑战，我们介绍了所有时间稳定性一词，这明确意味着每个可能的推出都将获得稳定性认证。我们的贡献是一种无模型的RL方法，不仅采用VIC，而且还实现了所有时间稳定性。在最近提出的稳定VIC控制器作为策略参数化的基础上，我们引入了一种新型的政策搜索算法，该算法的灵感来自跨透明方法并固有地保证了稳定性。我们的实验研究证实了稳定性保证的可行性和实用性，并据我们所知，RL的首次成功应用以及全天稳定性的首次应用在孔洞中的基准问题上。

Reinforcement learning (RL) has had its fair share of success in contact-rich manipulation tasks but it still lags behind in benefiting from advances in robot control theory such as impedance control and stability guarantees. Recently, the concept of variable impedance control (VIC) was adopted into RL with encouraging results. However, the more important issue of stability remains unaddressed. To clarify the challenge in stable RL, we introduce the term all-the-time-stability that unambiguously means that every possible rollout will be stability certified. Our contribution is a model-free RL method that not only adopts VIC but also achieves all-the-time-stability. Building on a recently proposed stable VIC controller as the policy parameterization, we introduce a novel policy search algorithm that is inspired by Cross-Entropy Method and inherently guarantees stability. Our experimental studies confirm the feasibility and usefulness of stability guarantee and also features, to the best of our knowledge, the first successful application of RL with all-the-time-stability on the benchmark problem of peg-in-hole.

下载PDF全文

下载文献需遵守相关版权规定

论文标题