PPO-EU：通过不确定性探索优化近端政策优化

论文标题

PPO-EU：通过不确定性探索优化近端政策优化

PPO-UE: Proximal Policy Optimization via Uncertainty-Aware Exploration

论文作者

Zhang, Qisheng, Guo, Zhen, Jøsang, Audun, Kaplan, Lance M., Chen, Feng, Jeong, Dong H., Cho, Jin-Hee

论文摘要

近端政策优化（PPO）是一种非常受欢迎的基于政策的深入强化学习（DRL）方法。但是，我们观察到，PPO中的同质探索过程可能会在训练阶段引起意外的稳定性问题。为了解决这个问题，我们提出了PPO-ue，这是一种基于比率不确定性水平的PPO变体，配备了自适应不确定性探索（UES）。提出的PPO-EU旨在以优化的比率不确定性水平提高收敛速度和性能。通过改变比率不确定性水平的广泛敏感性分析，我们提出的PPO-EU可以在Roboschool连续控制任务中的基线PPO大大优于基线PPO。

Proximal Policy Optimization (PPO) is a highly popular policy-based deep reinforcement learning (DRL) approach. However, we observe that the homogeneous exploration process in PPO could cause an unexpected stability issue in the training phase. To address this issue, we propose PPO-UE, a PPO variant equipped with self-adaptive uncertainty-aware explorations (UEs) based on a ratio uncertainty level. The proposed PPO-UE is designed to improve convergence speed and performance with an optimized ratio uncertainty level. Through extensive sensitivity analysis by varying the ratio uncertainty level, our proposed PPO-UE considerably outperforms the baseline PPO in Roboschool continuous control tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题