学习春季质量运动：使用降低的模型指导政策

论文标题

学习春季质量运动：使用降低的模型指导政策

Learning Spring Mass Locomotion: Guiding Policies with a Reduced-Order Model

论文作者

Green, Kevin, Godse, Yesh, Dao, Jeremy, Hatton, Ross L., Fern, Alan, Hurst, Jonathan

论文摘要

在本文中，我们描述了一种在物理机器人上实现动态腿部运动的方法，该方法结合了现有的控制方法与强化学习。具体而言，我们的目标是一个控制层次结构，在该层次结构中，通过减少订单模型来计划最高级别的行为，该模型描述了腿部运动的基本物理，而下层控制器则利用了一项学到的策略，该政策可以弥合理想化的，简单的模型和复杂的，完整的机器人之间的差距。高级规划师可以使用环境模型并特定任务，而低级学习的控制器可以执行广泛的动作，以便它适用于许多不同的任务。在这封信中，我们描述了这个学识渊博的动态步行控制器，并表明从降级模型中的一系列步行动作可以用作学习策略的命令和主要训练信号。由此产生的政策并不试图天真地跟踪运动（作为传统的轨迹跟踪控制器），而是要平衡立即运动跟踪与长期稳定性。在人体尺度上显示了所得的控制器，以高达1.2 m/s的速度，不受限制的，无束缚的双足机器人。这封信为一个通用，动态学习的步行控制器的基础建立了基础，该控制器可以应用于许多不同的任务。

In this paper, we describe an approach to achieve dynamic legged locomotion on physical robots which combines existing methods for control with reinforcement learning. Specifically, our goal is a control hierarchy in which highest-level behaviors are planned through reduced-order models, which describe the fundamental physics of legged locomotion, and lower level controllers utilize a learned policy that can bridge the gap between the idealized, simple model and the complex, full order robot. The high-level planner can use a model of the environment and be task specific, while the low-level learned controller can execute a wide range of motions so that it applies to many different tasks. In this letter we describe this learned dynamic walking controller and show that a range of walking motions from reduced-order models can be used as the command and primary training signal for learned policies. The resulting policies do not attempt to naively track the motion (as a traditional trajectory tracking controller would) but instead balance immediate motion tracking with long term stability. The resulting controller is demonstrated on a human scale, unconstrained, untethered bipedal robot at speeds up to 1.2 m/s. This letter builds the foundation of a generic, dynamic learned walking controller that can be applied to many different tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题