长距离多进球增强学习的最大熵增益探索

论文标题

长距离多进球增强学习的最大熵增益探索

Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

论文作者

Pitis, Silviu, Chan, Harris, Zhao, Stephen, Stadie, Bradly, Ba, Jimmy

论文摘要

在长途任务中训练期间，多进球的增强学习者应该追求什么目标？当所需的（测试时间）目标分布太远而无法提供有用的学习信号时，我们认为代理商不应追求无法获得的目标。取而代之的是，它应该设定自己的内在目标，以最大程度地提高历史所实现的目标分配的熵。我们建议通过让代理追求过去实现的目标，以优化这一目标，这将探索目标空间的稀疏领域，这将探索集中在可实现的目标集的边界上。我们表明，我们的策略实现了比以前在长距离多目标任务（包括迷宫导航和块堆积）上的最先前的样本效率更好的样本效率。

What goals should a multi-goal reinforcement learning agent pursue during training in long-horizon tasks? When the desired (test time) goal distribution is too distant to offer a useful learning signal, we argue that the agent should not pursue unobtainable goals. Instead, it should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution. We propose to optimize this objective by having the agent pursue past achieved goals in sparsely explored areas of the goal space, which focuses exploration on the frontier of the achievable goal set. We show that our strategy achieves an order of magnitude better sample efficiency than the prior state of the art on long-horizon multi-goal tasks including maze navigation and block stacking.

下载PDF全文

下载文献需遵守相关版权规定

论文标题