不确定性设置正则化的连续控制任务中的强大强化学习

论文标题

不确定性设置正则化的连续控制任务中的强大强化学习

Robust Reinforcement Learning in Continuous Control Tasks with Uncertainty Set Regularization

论文作者

Zhang, Yuan, Wang, Jianhong, Boedecker, Joschka

论文摘要

强化学习（RL）被认为是在环境扰动下缺乏概括和鲁棒性，这过度限制了其对现实世界机器人技术的应用。先前的工作声称，将正则化添加到价值函数等同于学习不确定的稳健策略。尽管正规化的转换对其简单性和效率具有吸引力，但它仍然缺乏连续的控制任务。在本文中，我们提出了一个名为$ \ textbf {u} $ ncneythity $ \ textbf {s} $ et $ et $ \ textbf {r} $ egularizer（usr）的新正常化程序，通过制定过渡功能的参数空间上的不确定性。特别是，USR足够灵活，可以插入任何现有的RL框架中。为了处理未知的不确定性集，我们进一步提出了一种基于价值函数生成它们的新型对抗方法。我们在现实世界增强学习（RWRL）基准中评估了USR，这证明了在扰动测试环境中的稳健性能的改进。

Reinforcement learning (RL) is recognized as lacking generalization and robustness under environmental perturbations, which excessively restricts its application for real-world robotics. Prior work claimed that adding regularization to the value function is equivalent to learning a robust policy with uncertain transitions. Although the regularization-robustness transformation is appealing for its simplicity and efficiency, it is still lacking in continuous control tasks. In this paper, we propose a new regularizer named $\textbf{U}$ncertainty $\textbf{S}$et $\textbf{R}$egularizer (USR), by formulating the uncertainty set on the parameter space of the transition function. In particular, USR is flexible enough to be plugged into any existing RL framework. To deal with unknown uncertainty sets, we further propose a novel adversarial approach to generate them based on the value function. We evaluate USR on the Real-world Reinforcement Learning (RWRL) benchmark, demonstrating improvements in the robust performance for perturbed testing environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题