GCS：基于图的多代理强化学习策略

论文标题

GCS：基于图的多代理强化学习策略

GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning

论文作者

Ruan, Jingqing, Du, Yali, Xiong, Xuantang, Xing, Dengpeng, Li, Xiyun, Meng, Linghui, Zhang, Haifeng, Wang, Jun, Xu, Bo

论文摘要

许多现实世界的场景都涉及一组代理团队，他们必须协调其政策以实现共同目标。先前的研究主要集中于分散控制，以最大程度地提高共同的奖励，几乎不考虑控制策略之间的协调，这在动态和复杂的环境中至关重要。在这项工作中，我们建议将联合团队政策分解为基于图形的生成器和基于图的协调策略，以实现代理之间的协调行为。该图生成器采用一个编码器框架，该框架输出定向无环图（DAGS）以捕获基本的动态决策结构。我们还在图生成器中应用了DAGNESS受限和DAG深度约束的优化，以平衡效率和性能。基于图的协调策略利用生成的决策结构。同时培训图形生成器和协调策略，以最大化折扣收益。关于协作高斯挤压，合作导航和Google Research Football的经验评估证明了该方法的优势。

Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题