论文标题
了解二阶近似在自然政策梯度强化学习中的影响
Understanding the Effects of Second-Order Approximations in Natural Policy Gradient Reinforcement Learning
论文作者
论文摘要
自然政策梯度方法是流行的强化学习方法,可通过利用二阶近似值来提高政策梯度方法的稳定性,以与Fisher-Fisher-Information Matrix的倒数相关。但是,据作者所知,尚未进行一项研究,该研究以全面和系统的方式研究了不同的二阶近似值的影响。为了解决这个问题,研究了五个不同的二阶近似值,并在多个关键指标中进行了比较,包括性能,稳定性,样本效率和计算时间。此外,研究文献中通常不承认的超参数包括不同批次大小的效果以及用自然梯度优化评论家网络。实验结果表明,平均而言,改进的二阶近似值达到了最佳性能,并且使用正确调整的超参数会导致性能和样本效率的大幅度提高,范围为 +181%。我们还可以在https://github.com/gebob19/natural-policy-policy-gradient-radient-Reinforceprices-learning上提供本研究中的代码。
Natural policy gradient methods are popular reinforcement learning methods that improve the stability of policy gradient methods by utilizing second-order approximations to precondition the gradient with the inverse of the Fisher-information matrix. However, to the best of the authors' knowledge, there has not been a study that has investigated the effects of different second-order approximations in a comprehensive and systematic manner. To address this, five different second-order approximations were studied and compared across multiple key metrics including performance, stability, sample efficiency, and computation time. Furthermore, hyperparameters which aren't typically acknowledged in the literature are studied including the effect of different batch sizes and optimizing the critic network with the natural gradient. Experimental results show that on average, improved second-order approximations achieve the best performance and that using properly tuned hyperparameters can lead to large improvements in performance and sample efficiency ranging up to +181%. We also make the code in this study available at https://github.com/gebob19/natural-policy-gradient-reinforcement-learning.