多代理增强学习的深层隐式协调图

论文标题

多代理增强学习的深层隐式协调图

Deep Implicit Coordination Graphs for Multi-agent Reinforcement Learning

论文作者

Li, Sheng, Gupta, Jayesh K., Morales, Peter, Allen, Ross, Kochenderfer, Mykel J.

论文摘要

多机构增强学习（MARL）需要协调以有效地解决某些任务。由于关节作用空间的大小，在此类域中通常在此类域中完全集中控制。基于协调图的形式化允许基于相互作用结构对联合行动进行推理。但是，他们通常需要设计领域的专业知识。本文介绍了此类情况的深层隐式协调图（DICG）体系结构。 DICG由一个模块组成，用于推断动态协调图结构，然后由基于图神经网络的模块使用，该模块将其隐含地理解有关关节动作或值的理由。 DICG允许通过标准参与者 - 批评方法学习完全集中化和权力下放之间的权衡，以显着改善大量试剂的域的协调。我们将DICG应用于集中式培训 - 居中的执行和集中式培训 - 占地化执行方案。我们证明，DICG解决了掠夺性挑战任务中相对过度笼过的病理学，并且在挑战性的Starcraft II多代理挑战（SMAC）和交通连接环境上胜过各种MARL基准。

Multi-agent reinforcement learning (MARL) requires coordination to efficiently solve certain tasks. Fully centralized control is often infeasible in such domains due to the size of joint action spaces. Coordination graph based formalization allows reasoning about the joint action based on the structure of interactions. However, they often require domain expertise in their design. This paper introduces the deep implicit coordination graph (DICG) architecture for such scenarios. DICG consists of a module for inferring the dynamic coordination graph structure which is then used by a graph neural network based module to learn to implicitly reason about the joint actions or values. DICG allows learning the tradeoff between full centralization and decentralization via standard actor-critic methods to significantly improve coordination for domains with large number of agents. We apply DICG to both centralized-training-centralized-execution and centralized-training-decentralized-execution regimes. We demonstrate that DICG solves the relative overgeneralization pathology in predatory-prey tasks as well as outperforms various MARL baselines on the challenging StarCraft II Multi-agent Challenge (SMAC) and traffic junction environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题