论文标题
稀疏增强学习中的样本效率:或您的钱退还
Sample Efficiency in Sparse Reinforcement Learning: Or Your Money Back
论文作者
论文摘要
稀疏的奖励在增强学习中是一个困难的问题,在具有复杂动态(例如现实世界机器人技术)的某些领域可能是不可避免的。 Hindsight Experience重播(她)是最近的重播记忆开发,它使代理商可以通过更改记忆来在稀疏设置中学习以表明他们是成功的,即使它们可能不是。虽然从经验上讲,她表现出了一些成功,但它并不能围绕从经纪人的重播记忆中绘制的样本的构成提供保证。这可能会导致小匹配者仅包含带有零价值奖励的记忆,或者代理人学习了一个不良政策,该政策完成了她调整后的目标而不是实际目标。 在本文中,我们介绍或您的资金(OYMB),这是一种旨在与她合作的重放记忆采样器。 OYMB通过为代理的重播内存提供直接接口,从而提高了稀疏设置的训练效率,该界面可以控制Minibatch化妆,以及优先查找方案,该方案优先考虑在调整后的记忆之前对现实目标的优先级。我们在三个独特的环境中对五个任务进行测试。我们的结果表明,将她与OYMB结合使用,胜过单独使用她,并导致代理商学习更快地完成真正目标的代理商。
Sparse rewards present a difficult problem in reinforcement learning and may be inevitable in certain domains with complex dynamics such as real-world robotics. Hindsight Experience Replay (HER) is a recent replay memory development that allows agents to learn in sparse settings by altering memories to show them as successful even though they may not be. While, empirically, HER has shown some success, it does not provide guarantees around the makeup of samples drawn from an agent's replay memory. This may result in minibatches that contain only memories with zero-valued rewards or agents learning an undesirable policy that completes HER-adjusted goals instead of the actual goal. In this paper, we introduce Or Your Money Back (OYMB), a replay memory sampler designed to work with HER. OYMB improves training efficiency in sparse settings by providing a direct interface to the agent's replay memory that allows for control over minibatch makeup, as well as a preferential lookup scheme that prioritizes real-goal memories before HER-adjusted memories. We test our approach on five tasks across three unique environments. Our results show that using HER in combination with OYMB outperforms using HER alone and leads to agents that learn to complete the real goal more quickly.