学习在经验重播缓冲区中使用本地和全球环境进行样品

论文标题

学习在经验重播缓冲区中使用本地和全球环境进行样品

Learning to Sample with Local and Global Contexts in Experience Replay Buffer

论文作者

Oh, Youngmin, Lee, Kimin, Shin, Jinwoo, Yang, Eunho, Hwang, Sung Ju

论文摘要

经验重播，使代理商能够记住和重用过去的经验，它在非政策增强学习（RL）的成功中发挥了重要作用。为了有效地利用经验重播，现有的抽样方法允许根据某些指标（例如TD-ERROR）对其征收优先级来选择更有意义的体验。但是，它们可能会导致对高度偏见的冗余过渡进行采样，因为它们独立计算每个过渡的采样率，而无需考虑其与其他过渡有关的重要性。在本文中，我们旨在通过提出一种基于学习的新抽样方法来解决问题，该方法可以计算过渡的相对重要性。为此，我们设计了一种新颖的置换式神经架构，该神经结构不仅从每个过渡（本地）的特征，而且从其他（全局）的特征中获取上下文。我们验证了我们的框架，我们将其称为NEARARE经验重播采样器（NERS），用于连续和离散控制任务的多个基准测试任务，并表明它可以显着提高各种非政策反RL方法的性能。进一步的分析证实，样本效率的提高确实是由于考虑了本地和全球环境的NERS进行了多种和有意义的过渡。

Experience replay, which enables the agents to remember and reuse experience from the past, has played a significant role in the success of off-policy reinforcement learning (RL). To utilize the experience replay efficiently, the existing sampling methods allow selecting out more meaningful experiences by imposing priorities on them based on certain metrics (e.g. TD-error). However, they may result in sampling highly biased, redundant transitions since they compute the sampling rate for each transition independently, without consideration of its importance in relation to other transitions. In this paper, we aim to address the issue by proposing a new learning-based sampling method that can compute the relative importance of transition. To this end, we design a novel permutation-equivariant neural architecture that takes contexts from not only features of each transition (local) but also those of others (global) as inputs. We validate our framework, which we refer to as Neural Experience Replay Sampler (NERS), on multiple benchmark tasks for both continuous and discrete control tasks and show that it can significantly improve the performance of various off-policy RL methods. Further analysis confirms that the improvements of the sample efficiency indeed are due to sampling diverse and meaningful transitions by NERS that considers both local and global contexts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题