论文标题
经验重播,不含可能性的重要性权重
Experience Replay with Likelihood-free Importance Weights
论文作者
论文摘要
使用过去的经验加速时间差异(TD)学习价值功能或经验重播是深度强化学习的关键组成部分。重要经验的优先级或重新释放已表明可以提高TD学习算法的性能。在这项工作中,我们建议根据当前政策的固定分配来根据其可能性重新体验体验。使用相应的重新加权TD目标,我们暗中鼓励在经常遇到的状态上的值函数上的小近似错误。我们在重播缓冲区上使用无似然密度比估计器来分配优先级的权重。我们将提出的方法在两种竞争方法上应用了拟议的方法,即软演员评论家(SAC)和双重延迟的深层确定性政策梯度(TD3) - 与其他基线方法相比,在一系列OpenAI Gym任务上,并实现了卓越的样本复杂性。
The use of past experiences to accelerate temporal difference (TD) learning of value functions, or experience replay, is a key component in deep reinforcement learning. Prioritization or reweighting of important experiences has shown to improve performance of TD learning algorithms.In this work, we propose to reweight experiences based on their likelihood under the stationary distribution of the current policy. Using the corresponding reweighted TD objective, we implicitly encourage small approximation errors on the value function over frequently encountered states. We use a likelihood-free density ratio estimator over the replay buffer to assign the prioritization weights. We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3) -- over a suite of OpenAI gym tasks and achieve superior sample complexity compared to other baseline approaches.