Lucid Dreaming for Experience重播：通过当前的政策刷新过去的状态

论文标题

Lucid Dreaming for Experience重播：通过当前的政策刷新过去的状态

Lucid Dreaming for Experience Replay: Refreshing Past States with the Current Policy

论文作者

Du, Yunshu, Warnell, Garrett, Gebremedhin, Assefaw, Stone, Peter, Taylor, Matthew E.

论文摘要

经验重播（ER）通过允许代理商存储并将其过去的经验重新使用在重播缓冲液中，提高了非政策增强学习（RL）算法的数据效率。尽管已经提出了许多技术来通过从缓冲区中采样体验来增强ER，但到目前为止，他们还没有考虑过在缓冲区内部刷新体验的策略。在这项工作中，我们介绍了Lucid Dreaming进行体验重播（Lider），这是一个概念上的新框架，可以通过利用代理商的当前政策来刷新重播体验。 Lider由三个步骤组成：首先，Lider将代理商转移回过去的状态。其次，从该状态下，莱德通过遵循当前的政策来执行一系列行动 - 好像代理人正在“梦想”过去，并且可以尝试不同的行为以在梦中遇到新的经历。第三，如果新经验比代理商以前经历的要好，即刷新其记忆，则将其存储并重新存储新体验。 Lider被设计为轻松地将其纳入使用ER的多工具RL算法中；我们在这项工作中介绍了将雷德应用于基于参与者的算法的案例研究。结果表明，Lider在六场Atari 2600场比赛中始终提高基线的性能。我们对LIDER的开源实现以及用于生成本工作中所有图的数据，请访问github.com/duyunshu/lucid-dreaming-for-exp-replay。

Experience replay (ER) improves the data efficiency of off-policy reinforcement learning (RL) algorithms by allowing an agent to store and reuse its past experiences in a replay buffer. While many techniques have been proposed to enhance ER by biasing how experiences are sampled from the buffer, thus far they have not considered strategies for refreshing experiences inside the buffer. In this work, we introduce Lucid Dreaming for Experience Replay (LiDER), a conceptually new framework that allows replay experiences to be refreshed by leveraging the agent's current policy. LiDER consists of three steps: First, LiDER moves an agent back to a past state. Second, from that state, LiDER then lets the agent execute a sequence of actions by following its current policy -- as if the agent were "dreaming" about the past and can try out different behaviors to encounter new experiences in the dream. Third, LiDER stores and reuses the new experience if it turned out better than what the agent previously experienced, i.e., to refresh its memories. LiDER is designed to be easily incorporated into off-policy, multi-worker RL algorithms that use ER; we present in this work a case study of applying LiDER to an actor-critic based algorithm. Results show LiDER consistently improves performance over the baseline in six Atari 2600 games. Our open-source implementation of LiDER and the data used to generate all plots in this work are available at github.com/duyunshu/lucid-dreaming-for-exp-replay.

下载PDF全文

下载文献需遵守相关版权规定

论文标题