邻里混合体验重播：局部凸插值，以提高连续控制任务的样品效率

论文标题

邻里混合体验重播：局部凸插值，以提高连续控制任务的样品效率

Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks

论文作者

Sander, Ryan, Schwarting, Wilko, Seyde, Tim, Gilitschenski, Igor, Karaman, Sertac, Rus, Daniela

论文摘要

经验重播在提高深钢筋学习剂的样本效率方面起着至关重要的作用。经验的最新进展重播建议使用混合（Zhang等，2018）通过合成样品产生进一步提高样品效率。我们以邻里混合体验重播（NMER）为基础，这是一种几何地面的重播缓冲液，可在州行动空间中与最接近的邻居进行过渡。 NMER仅在具有阴影状态行动特征的过渡之间应用混合，可以保留过渡歧管的局部线性近似。在NMER下，给定过渡的一组国家行动邻居是动态的，情节不可知，进而鼓励通过情节间插值进行更大的政策通用性。我们将方法与最近的非政策深度强化学习算法相结合，并在连续的控制环境中进行评估。我们观察到，NMER在基线重播缓冲液中平均提高了94％（TD3）和29％（SAC）的样本效率，从而使代理能够有效地重组以前的经验并从有限的数据中学习。

Experience replay plays a crucial role in improving the sample efficiency of deep reinforcement learning agents. Recent advances in experience replay propose using Mixup (Zhang et al., 2018) to further improve sample efficiency via synthetic sample generation. We build upon this technique with Neighborhood Mixup Experience Replay (NMER), a geometrically-grounded replay buffer that interpolates transitions with their closest neighbors in state-action space. NMER preserves a locally linear approximation of the transition manifold by only applying Mixup between transitions with vicinal state-action features. Under NMER, a given transition's set of state action neighbors is dynamic and episode agnostic, in turn encouraging greater policy generalizability via inter-episode interpolation. We combine our approach with recent off-policy deep reinforcement learning algorithms and evaluate on continuous control environments. We observe that NMER improves sample efficiency by an average 94% (TD3) and 29% (SAC) over baseline replay buffers, enabling agents to effectively recombine previous experiences and learn from limited data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题