分布式合作的多代理增强学习，并有指示协调图

论文标题

分布式合作的多代理增强学习，并有指示协调图

Distributed Cooperative Multi-Agent Reinforcement Learning with Directed Coordination Graph

论文作者

Jing, Gangshan, Bai, He, George, Jemin, Chakrabortty, Aranya, Sharma, Piyush. K.

论文摘要

现有的分布式合作多代理增强学习（MARL）框架通常假设无方向的协调图和通信图，同时通过共识算法估算全球奖励，以进行政策评估。这样的框架可能会导致昂贵的沟通成本，并且由于全球共识的要求而表现出较差的可扩展性。在这项工作中，我们研究了具有定向协调图的灰烬，并提出了一种分布式的RL算法，其中本地策略评估基于本地价值函数。每个代理的局部价值函数是通过与其使用任何共识算法的有向学习引起的通信图通过与其邻居的局部通信获得的。基于参数扰动的零阶优化（ZOO）方法用于实现梯度估计。通过与现有的基于动物园的RL算法进行比较，我们表明我们提出的分布式RL算法可以保证高可扩展性。显示了一个分布式资源分配示例，以说明我们算法的有效性。

Existing distributed cooperative multi-agent reinforcement learning (MARL) frameworks usually assume undirected coordination graphs and communication graphs while estimating a global reward via consensus algorithms for policy evaluation. Such a framework may induce expensive communication costs and exhibit poor scalability due to requirement of global consensus. In this work, we study MARLs with directed coordination graphs, and propose a distributed RL algorithm where the local policy evaluations are based on local value functions. The local value function of each agent is obtained by local communication with its neighbors through a directed learning-induced communication graph, without using any consensus algorithm. A zeroth-order optimization (ZOO) approach based on parameter perturbation is employed to achieve gradient estimation. By comparing with existing ZOO-based RL algorithms, we show that our proposed distributed RL algorithm guarantees high scalability. A distributed resource allocation example is shown to illustrate the effectiveness of our algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题