在派上的增强学习中重要的是什么？一项大规模实证研究

论文标题

在派上的增强学习中重要的是什么？一项大规模实证研究

What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

论文作者

Andrychowicz, Marcin, Raichuk, Anton, Stańczyk, Piotr, Orsini, Manu, Girgin, Sertan, Marinier, Raphael, Hussenot, Léonard, Geist, Matthieu, Pietquin, Olivier, Michalski, Marcin, Gelly, Sylvain, Bachem, Olivier

论文摘要

近年来，在许多不同的连续控制任务上已成功地应用了policy钢筋学习（RL）。尽管RL算法在概念上通常很简单，但它们的最先进的实现采取了许多低水平和高级设计决策，从而强烈影响所得代理的性能。这些选择通常不会在文献中进行广泛讨论，从而导致对算法的描述及其实现之间的差异。这使得很难归因RL的进度并减慢整体进度[Engstrom'20]。为了填补这一差距，我们在统一的车间RL框架中实施了> 50个这样的``选择''，从而使我们能够在一项大规模的经验研究中调查它们的影响。我们在不同复杂性的五个连续控制环境中培训超过25万名代理商，并为RL代理进行实行培训提供见解和实用建议。

In recent years, on-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations. This makes it hard to attribute progress in RL and slows down overall progress [Engstrom'20]. As a step towards filling that gap, we implement >50 such ``choices'' in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study. We train over 250'000 agents in five continuous control environments of different complexity and provide insights and practical recommendations for on-policy training of RL agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题