在部分可观察的马尔可夫游戏中迈向多目标自组织追求

论文标题

在部分可观察的马尔可夫游戏中迈向多目标自组织追求

Toward multi-target self-organizing pursuit in a partially observable Markov game

论文作者

Sun, Lijun, Chang, Yu-Cheng, Lyu, Chao, Shi, Ye, Shi, Yuhui, Lin, Chin-Teng

论文摘要

多目标自组织追求（SOP）问题已广泛应用，并被认为是一个充满挑战的分布式系统的自组织游戏，在该系统中，智能代理在其中合作追求具有部分观察的多个动态目标。这项工作为分散的多机构系统提供了一个框架，以提高搜索和追求中的隐式协调能力。我们将一个自组织的系统建模为可观察到的马尔可夫游戏（POMG），其中包括大规模，权力下放，部分观察和非交流。然后，将提出的分布式算法：模糊自组织合作协调（FSC2）杠杆来解决多目标SOP中的三个挑战：分布式自组织搜索（SOS），分布式任务分配，并分布式单目标追踪。 FSC2包括一种协调的多代理深钢筋学习（MARL）方法，该方法使均匀的代理能够学习自然的SOS模式。此外，我们提出了一种基于模糊的分布式任务分配方法，该方法将多目标SOP分解为几个单目标追求问题。合作进化原则用于协调每个单一目标问题的分布式追随者。因此，可以缓解POMG中固有的部分观察和分布式决策的不确定性。实验结果表明，通过分解SOP任务，FSC2与由一般MARL算法完全训练的其他隐式协调策略相比，实现了卓越的性能。 FSC2的可伸缩性证明，高达2048 FSC2代理执行有效的多目标SOP，捕获率将近100％。经验分析和消融研究验证了FSC2中组成算法的解释性，合理性和有效性。

The multiple-target self-organizing pursuit (SOP) problem has wide applications and has been considered a challenging self-organization game for distributed systems, in which intelligent agents cooperatively pursue multiple dynamic targets with partial observations. This work proposes a framework for decentralized multi-agent systems to improve the implicit coordination capabilities in search and pursuit. We model a self-organizing system as a partially observable Markov game (POMG) featured by large-scale, decentralization, partial observation, and noncommunication. The proposed distributed algorithm: fuzzy self-organizing cooperative coevolution (FSC2) is then leveraged to resolve the three challenges in multi-target SOP: distributed self-organizing search (SOS), distributed task allocation, and distributed single-target pursuit. FSC2 includes a coordinated multi-agent deep reinforcement learning (MARL) method that enables homogeneous agents to learn natural SOS patterns. Additionally, we propose a fuzzy-based distributed task allocation method, which locally decomposes multi-target SOP into several single-target pursuit problems. The cooperative coevolution principle is employed to coordinate distributed pursuers for each single-target pursuit problem. Therefore, the uncertainties of inherent partial observation and distributed decision-making in the POMG can be alleviated. The experimental results demonstrate that by decomposing the SOP task, FSC2 achieves superior performance compared with other implicit coordination policies fully trained by general MARL algorithms. The scalability of FSC2 is proved that up to 2048 FSC2 agents perform efficient multi-target SOP with almost 100 percent capture rates. Empirical analyses and ablation studies verify the interpretability, rationality, and effectiveness of component algorithms in FSC2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题