关于梯度预期sarsa的收敛（$λ$）

论文标题

关于梯度预期sarsa的收敛（$λ$）

On Convergence of Gradient Expected Sarsa($λ$)

论文作者

Yang, Long, Zheng, Gang, Zhang, Yu, Zheng, Qian, Li, Pengfei, Pan, Gang

论文摘要

我们研究了$ \ mathtt {预期〜SARSA}（λ）$具有线性函数近似的收敛性。我们表明，将离线估计值（多步宽图）应用于$ \ mathtt {预期〜SARSA}（λ）$对于非政策学习是不稳定的。此外，基于凸 - 孔杆鞍点框架，我们提出了一个收敛的$ \ mathtt {渐变〜预期〜SARSA}（λ）$（$ \ mathtt {ges} {ges}（λ）$）algorithm。理论分析表明，我们的$ \ mathtt {ges}（λ）$以线性收敛速率收敛到最佳解决方案，这与广泛的现有的现有最新梯度时间差异学习算法相媲美。此外，我们开发了一种Lyapunov功能技术，以研究阶梯尺寸如何影响$ \ Mathtt {ges}（λ）$的有限时间性能，Lyapunov函数的这种技术可能会被潜在地推广到其他GTD算法。最后，我们进行实验以验证$ \ mathtt {ges}（λ）$的有效性。

We study the convergence of $\mathtt{Expected~Sarsa}(λ)$ with linear function approximation. We show that applying the off-line estimate (multi-step bootstrapping) to $\mathtt{Expected~Sarsa}(λ)$ is unstable for off-policy learning. Furthermore, based on convex-concave saddle-point framework, we propose a convergent $\mathtt{Gradient~Expected~Sarsa}(λ)$ ($\mathtt{GES}(λ)$) algorithm. The theoretical analysis shows that our $\mathtt{GES}(λ)$ converges to the optimal solution at a linear convergence rate, which is comparable to extensive existing state-of-the-art gradient temporal difference learning algorithms. Furthermore, we develop a Lyapunov function technique to investigate how the step-size influences finite-time performance of $\mathtt{GES}(λ)$, such technique of Lyapunov function can be potentially generalized to other GTD algorithms. Finally, we conduct experiments to verify the effectiveness of our $\mathtt{GES}(λ)$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题