Acder：增强好奇心驱动的体验重播

论文标题

Acder：增强好奇心驱动的体验重播

ACDER: Augmented Curiosity-Driven Experience Replay

论文作者

Li, Boyao, Lu, Tao, Li, Jiayi, Lu, Ning, Cai, Yinghao, Wang, Shuo

论文摘要

反馈稀疏的环境中的探索仍然是强化学习（RL）的具有挑战性的研究问题。当RL代理随机探索环境时，它会导致较低的勘探效率，尤其是在具有高维连续状态和动作空间的机器人操纵任务中。在本文中，我们提出了一种新颖的方法，称为增强好奇心驱动的体验重播（ACDER），它利用（i）一种新的面向目标的好奇心驱动的探索，以鼓励特工更有目的地寻求新颖和任务与任务相关的状态，并且（ii）动态的初始状态选择作为自动探索性探索性课程，以提高样品效率。我们的方法通过引入一种追求宝贵国家的新方式来补充事后的经验重播（她）。实验对四项具有二进制奖励的机器人操纵任务进行了实验，包括触及，推送，拾取和地点以及多步推动。经验结果表明，我们提出的方法在前三个基本任务中大大优于现有方法，并且在多步机器人任务学习中的表现令人满意。

Exploration in environments with sparse feedback remains a challenging research problem in reinforcement learning (RL). When the RL agent explores the environment randomly, it results in low exploration efficiency, especially in robotic manipulation tasks with high dimensional continuous state and action space. In this paper, we propose a novel method, called Augmented Curiosity-Driven Experience Replay (ACDER), which leverages (i) a new goal-oriented curiosity-driven exploration to encourage the agent to pursue novel and task-relevant states more purposefully and (ii) the dynamic initial states selection as an automatic exploratory curriculum to further improve the sample-efficiency. Our approach complements Hindsight Experience Replay (HER) by introducing a new way to pursue valuable states. Experiments conducted on four challenging robotic manipulation tasks with binary rewards, including Reach, Push, Pick&Place and Multi-step Push. The empirical results show that our proposed method significantly outperforms existing methods in the first three basic tasks and also achieves satisfactory performance in multi-step robotic task learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题