与自适应lookahead进行计划和学习

论文标题

与自适应lookahead进行计划和学习

Planning and Learning with Adaptive Lookahead

论文作者

Rosenberg, Aviv, Hallak, Assaf, Mannor, Shie, Chechik, Gal, Dalal, Gal

论文摘要

一些最强大的强化学习框架使用计划进行行动选择。有趣的是，他们的计划范围是由国有访问历史固定或任意决定的。在这里，我们超越了天真的固定视野，并提出了一种理论上合理的策略，以自适应选择计划视野，这是国家依赖性价值估计的函数。我们提出了两个用于lookahead选择的变体，并分析了迭代计数与计算复杂性之间的权衡。然后，我们使用自适应树搜索范围设计了一种相应的深Q网络算法。我们将每个深度的价值估计分开，以补偿深度之间的非政策差异。最后，我们在迷宫环境和atari中展示了自适应lookahead方法的功效。

Some of the most powerful reinforcement learning frameworks use planning for action selection. Interestingly, their planning horizon is either fixed or determined arbitrarily by the state visitation history. Here, we expand beyond the naive fixed horizon and propose a theoretically justified strategy for adaptive selection of the planning horizon as a function of the state-dependent value estimate. We propose two variants for lookahead selection and analyze the trade-off between iteration count and computational complexity per iteration. We then devise a corresponding deep Q-network algorithm with an adaptive tree search horizon. We separate the value estimation per depth to compensate for the off-policy discrepancy between depths. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari.

下载PDF全文

下载文献需遵守相关版权规定

论文标题