GRAC：自我引导和自我调节的演员评论

论文标题

GRAC：自我引导和自我调节的演员评论

GRAC: Self-Guided and Self-Regularized Actor-Critic

论文作者

Shao, Lin, You, Yifan, Yan, Mengyuan, Sun, Qingyun, Bohg, Jeannette

论文摘要

深度强化学习（DRL）算法已成功地在一系列具有挑战性的决策和控制任务上证明。最近的深度强化学习算法的主要组成部分是目标网络，它在学习Q函数时会减轻差异。但是，由于功能更新延迟，目标网络可以减慢学习过程。我们在这项工作中的主要贡献是一种自我调节的TD学习方法，可以解决差异，而无需目标网络。此外，我们通过将策略授权与零级优化相结合以搜索与较大社区中较高Q值相关的动作来提出一种自我引导的政策改进方法。这使得学习在Q功能近似中对本地噪声的学习更加强大，并指导了我们参与者网络的更新。综上所述，这些组件定义了GRAC，这是一种新颖的自导和自我调节的演员评论家算法。我们在每个测试的环境中评估了Openai Gym任务套件的GRAC，在每个环境中实现或胜过最佳状态。

Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component of recent deep reinforcement learning algorithms is the target network which mitigates the divergence when learning the Q function. However, target networks can slow down the learning process due to delayed function updates. Our main contribution in this work is a self-regularized TD-learning method to address divergence without requiring a target network. Additionally, we propose a self-guided policy improvement method by combining policy-gradient with zero-order optimization to search for actions associated with higher Q-values in a broad neighborhood. This makes learning more robust to local noise in the Q function approximation and guides the updates of our actor network. Taken together, these components define GRAC, a novel self-guided and self-regularized actor critic algorithm. We evaluate GRAC on the suite of OpenAI gym tasks, achieving or outperforming state of the art in every environment tested.

下载PDF全文

下载文献需遵守相关版权规定

论文标题