网络上分布式强化学习中的完全异步政策评估

论文标题

网络上分布式强化学习中的完全异步政策评估

Fully Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks

论文作者

Sha, Xingyu, Zhang, Jiaqi, You, Keyou, Zhang, Kaiqing, Başar, Tamer

论文摘要

本文提出了针对有向的点对点网络的分布式增强学习（DISRL）的策略评估问题的\ EMPH {完全异步}方案。在不等待网络的任何其他节点的情况下，每个节点都可以随时使用（可能延迟）邻居的信息来本地更新其值函数。这与基于八卦的方案形成鲜明对比，其中一对节点同时更新。 Though the fully asynchronous setting involves a difficult multi-timescale decision problem, we design a novel stochastic average gradient (SAG) based distributed algorithm and develop a push-pull augmented graph approach to prove its exact convergence at a linear rate of $\mathcal{O}(c^k)$ where $c\in(0,1)$ and $k$ increases by one no matter on which node updates.最后，数值实验验证了我们的方法是否相对于节点的数量有线性加速，并且对straggler节点具有鲁棒性。

This paper proposes a \emph{fully asynchronous} scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks. Without waiting for any other node of the network, each node can locally update its value function at any time by using (possibly delayed) information from its neighbors. This is in sharp contrast to the gossip-based scheme where a pair of nodes concurrently update. Though the fully asynchronous setting involves a difficult multi-timescale decision problem, we design a novel stochastic average gradient (SAG) based distributed algorithm and develop a push-pull augmented graph approach to prove its exact convergence at a linear rate of $\mathcal{O}(c^k)$ where $c\in(0,1)$ and $k$ increases by one no matter on which node updates. Finally, numerical experiments validate that our method speeds up linearly with respect to the number of nodes, and is robust to straggler nodes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题