论文标题
FCMNET:多代理系统中团队级合作的完整通信记忆网
FCMNet: Full Communication Memory Net for Team-Level Cooperation in Multi-Agent Systems
论文作者
论文摘要
部分观察到的多代理系统中的分散合作需要代理之间有效的沟通。为了支持这项工作,这项工作集中在可用的全球通信但可能不可靠的问题类别上,从而排除了可区分的沟通学习方法。我们介绍了FCMNET,这是一种基于加强学习的方法,允许代理商同时学习a)有效的多跳沟通协议,b)一种常见的,分散的政策,可以实现团队级别的决策。具体而言,我们提出的方法利用多个定向复发性神经网络的隐藏状态作为代理之间的通信信息。使用一个简单的多跳拓扑,我们赋予每个代理商在每个时间步骤中依次由其他代理商依次编码的信息,从而改善全球合作。我们在具有共同奖励的一系列具有挑战性的Starcraft II微管理任务上演示了FCMNET,以及具有个人奖励的协作多代理探路任务。在那里,我们的比较结果表明,在所有Starcraft II微观管理任务中,FCMNET优于基于通信的增强性增强学习方法,以及在某些任务中的价值分解方法。我们进一步研究了FCMNET在现实的通信干扰(例如随机消息丢失或二元消息(即非可不同的通信通道))下的鲁棒性,以展示FMCNET在各种现实世界中的机器人任务中的潜在适用性。
Decentralized cooperation in partially-observable multi-agent systems requires effective communications among agents. To support this effort, this work focuses on the class of problems where global communications are available but may be unreliable, thus precluding differentiable communication learning methods. We introduce FCMNet, a reinforcement learning based approach that allows agents to simultaneously learn a) an effective multi-hop communications protocol and b) a common, decentralized policy that enables team-level decision-making. Specifically, our proposed method utilizes the hidden states of multiple directional recurrent neural networks as communication messages among agents. Using a simple multi-hop topology, we endow each agent with the ability to receive information sequentially encoded by every other agent at each time step, leading to improved global cooperation. We demonstrate FCMNet on a challenging set of StarCraft II micromanagement tasks with shared rewards, as well as a collaborative multi-agent pathfinding task with individual rewards. There, our comparison results show that FCMNet outperforms state-of-the-art communication-based reinforcement learning methods in all StarCraft II micromanagement tasks, and value decomposition methods in certain tasks. We further investigate the robustness of FCMNet under realistic communication disturbances, such as random message loss or binarized messages (i.e., non-differentiable communication channels), to showcase FMCNet's potential applicability to robotic tasks under a variety of real-world conditions.