论文标题
从示范中学习层次强化学习中的健忘经验重播
Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations
论文作者
论文摘要
当前,深度加固学习(RL)在复杂的游戏和机器人环境中显示出令人印象深刻的结果。通常,这些结果是以巨大的计算成本来实现的,需要大量的代理与环境之间的相互作用发作。使用分层方法和专家演示,有两种主要方法来提高增强学习方法的样本效率。在本文中,我们提出了这些方法的组合,这些方法使代理商可以在具有多个相关目标的复杂基于视觉的环境中使用低质量的演示。我们健忘的经验重播(Forger)算法有效地处理了专家数据中的错误,并在调整动作空间并说明代理能力时会降低质量损失。我们提出的面向目标缓冲区的目标结构使代理可以自动突出显示示例中复杂分层任务的子目标。我们的方法是通用的,并且可以集成到各种非政策方法中。它使用各种模型环境上的专家演示超过了所有已知的现有最新RL方法。基于我们的算法的解决方案击败了著名的矿机竞争的所有解决方案,并使特工可以在Minecraft环境中开采一颗钻石。
Currently, deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. Often these results are achieved at the expense of huge computational costs and require an incredible number of episodes of interaction between the agent and the environment. There are two main approaches to improving the sample efficiency of reinforcement learning methods - using hierarchical methods and expert demonstrations. In this paper, we propose a combination of these approaches that allow the agent to use low-quality demonstrations in complex vision-based environments with multiple related goals. Our forgetful experience replay (ForgER) algorithm effectively handles errors in expert data and reduces quality losses when adapting the action space and states representation to the agent's capabilities. Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. Our method is universal and can be integrated into various off-policy methods. It surpasses all known existing state-of-the-art RL methods using expert demonstrations on various model environments. The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.