启动加强学习

论文标题

启动加强学习

Jump-Start Reinforcement Learning

论文作者

Uchendu, Ikechukwu, Xiao, Ted, Lu, Yao, Zhu, Banghua, Yan, Mengyuan, Simon, Joséphine, Bennice, Matthew, Fu, Chuyuan, Ma, Cong, Jiao, Jiantao, Levine, Sergey, Hausman, Karol

论文摘要

强化学习（RL）提供了一个理论框架，可通过反复试验不断改善代理的行为。但是，从头开始有效学习政策可能非常困难，尤其是对于探索挑战的任务。在这种情况下，希望使用现有策略，离线数据或演示初始化RL。但是，在RL中天真地执行此类初始化通常效果不佳，尤其是对于基于价值的方法。在本文中，我们提出了一种可以使用离线数据，演示或预先存在的策略来初始化RL策略的元算法，并且与任何RL方法兼容。特别是，我们提出了Jump-start增强学习（JSRL），该算法采用了两种政策来解决任务：指导政策和一个探索 - 政策。通过使用指南 - 构成探索策略的起始状态课程，我们能够有效地提高一组模拟机器人任务的性能。我们通过实验表明，JSRL能够显着胜过现有的模仿和增强学习算法，尤其是在小型数据制度中。此外，我们在JSRL的样品复杂性上提供了上限，并表明借助指南，可以改善从指数中的指数式探索方法的样品复杂性，从地平线上的指数到多项式。

Reinforcement learning (RL) provides a theoretical framework for continuously improving an agent's behavior via trial and error. However, efficiently learning policies from scratch can be very difficult, particularly for tasks with exploration challenges. In such settings, it might be desirable to initialize RL with an existing policy, offline data, or demonstrations. However, naively performing such initialization in RL often works poorly, especially for value-based methods. In this paper, we present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy, and is compatible with any RL approach. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks: a guide-policy, and an exploration-policy. By using the guide-policy to form a curriculum of starting states for the exploration-policy, we are able to efficiently improve performance on a set of simulated robotic tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms, particularly in the small-data regime. In addition, we provide an upper bound on the sample complexity of JSRL and show that with the help of a guide-policy, one can improve the sample complexity for non-optimism exploration methods from exponential in horizon to polynomial.

下载PDF全文

下载文献需遵守相关版权规定

论文标题