机器人乒乓球和无模型的加固学习

论文标题

机器人乒乓球和无模型的加固学习

Robotic Table Tennis with Model-Free Reinforcement Learning

论文作者

Gao, Wenbo, Graesser, Laura, Choromanski, Krzysztof, Song, Xingyou, Lazic, Nevena, Sanketi, Pannag, Sindhwani, Vikas, Jaitly, Navdeep

论文摘要

我们提出了一种无模型算法，用于学习有效的策略，能够通过以100Hz的速度控制机器人接头返回乒乓球。我们证明，作用于非视觉输入的基于CNN的策略架构的进化搜索方法（ES）方法，并在时间上进行卷动学习紧凑的控制器，导致动作流畅。此外，我们表明，通过对任务和奖励进行适当调整的课程学习，政策能够开发多模式样式，特别是正手和反手中风，同时在广泛的球掷球上达到80 \％的回报率。我们观察到多模式不需要任何建筑先验，例如多头架构或分层政策。

We propose a model-free algorithm for learning efficient policies capable of returning table tennis balls by controlling robot joints at a rate of 100Hz. We demonstrate that evolutionary search (ES) methods acting on CNN-based policy architectures for non-visual inputs and convolving across time learn compact controllers leading to smooth motions. Furthermore, we show that with appropriately tuned curriculum learning on the task and rewards, policies are capable of developing multi-modal styles, specifically forehand and backhand stroke, whilst achieving 80\% return rate on a wide range of ball throws. We observe that multi-modality does not require any architectural priors, such as multi-head architectures or hierarchical policies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题