论文标题

关于梯度预期sarsa的收敛($λ$)

On Convergence of Gradient Expected Sarsa($λ$)

论文作者

Yang, Long, Zheng, Gang, Zhang, Yu, Zheng, Qian, Li, Pengfei, Pan, Gang

论文摘要

我们研究了$ \ mathtt {预期〜SARSA}(λ)$具有线性函数近似的收敛性。我们表明,将离线估计值(多步宽图)应用于$ \ mathtt {预期〜SARSA}(λ)$对于非政策学习是不稳定的。此外,基于凸 - 孔杆鞍点框架,我们提出了一个收敛的$ \ mathtt {渐变〜预期〜SARSA}(λ)$($ \ mathtt {ges} {ges}(λ)$)algorithm。理论分析表明,我们的$ \ mathtt {ges}(λ)$以线性收敛速率收敛到最佳解决方案,这与广泛的现有的现有最新梯度时间差异学习算法相媲美。此外,我们开发了一种Lyapunov功能技术,以研究阶梯尺寸如何影响$ \ Mathtt {ges}(λ)$的有限时间性能,Lyapunov函数的这种技术可能会被潜在地推广到其他GTD算法。最后,我们进行实验以验证$ \ mathtt {ges}(λ)$的有效性。

We study the convergence of $\mathtt{Expected~Sarsa}(λ)$ with linear function approximation. We show that applying the off-line estimate (multi-step bootstrapping) to $\mathtt{Expected~Sarsa}(λ)$ is unstable for off-policy learning. Furthermore, based on convex-concave saddle-point framework, we propose a convergent $\mathtt{Gradient~Expected~Sarsa}(λ)$ ($\mathtt{GES}(λ)$) algorithm. The theoretical analysis shows that our $\mathtt{GES}(λ)$ converges to the optimal solution at a linear convergence rate, which is comparable to extensive existing state-of-the-art gradient temporal difference learning algorithms. Furthermore, we develop a Lyapunov function technique to investigate how the step-size influences finite-time performance of $\mathtt{GES}(λ)$, such technique of Lyapunov function can be potentially generalized to other GTD algorithms. Finally, we conduct experiments to verify the effectiveness of our $\mathtt{GES}(λ)$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源