论文标题
与双Q学习相关
Decorrelated Double Q-learning
论文作者
论文摘要
Q学习函数近似的学习可能由于高估偏差和不精确的估计值可能的性能较差。具体而言,高估偏差来自最大算子估计噪声估计,而噪声估计值使用后续状态的估计值夸大了。受到最近深入的增强学习和双重Q学习的启发,我们介绍了与双重Q学习(D2Q)的介绍。具体而言,我们介绍了反相关的正则化项,以减少价值函数近似器之间的相关性,这可能导致偏差较小和差异较低。一套穆乔科连续控制任务的实验结果表明,我们去脱离的双Q学习可以有效地改善性能。
Q-learning with value function approximation may have the poor performance because of overestimation bias and imprecise estimate. Specifically, overestimation bias is from the maximum operator over noise estimate, which is exaggerated using the estimate of a subsequent state. Inspired by the recent advance of deep reinforcement learning and Double Q-learning, we introduce the decorrelated double Q-learning (D2Q). Specifically, we introduce the decorrelated regularization item to reduce the correlation between value function approximators, which can lead to less biased estimation and low variance. The experimental results on a suite of MuJoCo continuous control tasks demonstrate that our decorrelated double Q-learning can effectively improve the performance.