基于最大熵的混合动作空间深层多代理增强学习

论文标题

基于最大熵的混合动作空间深层多代理增强学习

Deep Multi-Agent Reinforcement Learning with Hybrid Action Spaces based on Maximum Entropy

论文作者

Hua, Hongzhi, Wu, Kaigui, Wen, Guixuan

论文摘要

多代理深入的强化学习已应用于解决各种离散或连续动作空间的复杂问题，并取得了巨大的成功。但是，大多数实际环境不能仅通过离散的动作空间或连续的动作空间来描述。而且，很少有作品曾经利用深入的加固学习（DRL）来解决混合动作空间的多代理问题。因此，我们提出了一种新颖的算法：深层混合软性角色批评（MAHSAC）来填补这一空白。该算法遵循集中式培训，但分散执行（CTDE）范式，并扩展软actor-Critic算法（SAC），以根据最大熵在多机构环境中处理混合动作空间问题。我们的经验在一个简单的多代理粒子世界上运行，具有连续的观察和离散的动作空间，以及一些基本的模拟物理。实验结果表明，MAHSAC在训练速度，稳定性和抗干扰能力方面具有良好的性能。同时，它在合作场景和竞争性场景中胜过现有的独立深层学习方法。

Multi-agent deep reinforcement learning has been applied to address a variety of complex problems with either discrete or continuous action spaces and achieved great success. However, most real-world environments cannot be described by only discrete action spaces or only continuous action spaces. And there are few works having ever utilized deep reinforcement learning (drl) to multi-agent problems with hybrid action spaces. Therefore, we propose a novel algorithm: Deep Multi-Agent Hybrid Soft Actor-Critic (MAHSAC) to fill this gap. This algorithm follows the centralized training but decentralized execution (CTDE) paradigm, and extend the Soft Actor-Critic algorithm (SAC) to handle hybrid action space problems in Multi-Agent environments based on maximum entropy. Our experiences are running on an easy multi-agent particle world with a continuous observation and discrete action space, along with some basic simulated physics. The experimental results show that MAHSAC has good performance in training speed, stability, and anti-interference ability. At the same time, it outperforms existing independent deep hybrid learning method in cooperative scenarios and competitive scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题