R-MADDPG用于部分可观察到的环境和有限的通信

论文标题

R-MADDPG用于部分可观察到的环境和有限的通信

R-MADDPG for Partially Observable Environments and Limited Communication

论文作者

Wang, Rose E., Everett, Michael, How, Jonathan P.

论文摘要

应用多种多项增强学习（MARL）算法，包括自动驾驶汽车之间的协调，可以从多个现实世界中受益。现实世界对多种学习系统（例如其部分可观察和非组织性质）具有具有挑战性的条件。此外，如果代理必须共享有限的资源（例如网络带宽），则必须所有人都学习如何协调资源使用。本文介绍了一个深层复发的多种参与者 - 批评框架（R-MADDPG），用于在局部可观察的设置和有限的沟通中处理多种配位。我们研究了对代理团队的性能和沟通使用的复发影响。我们证明，由此产生的框架学习了分享缺失的观察，处理资源限制并在代理之间开发不同的通信模式的时间依赖性。

There are several real-world tasks that would benefit from applying multiagent reinforcement learning (MARL) algorithms, including the coordination among self-driving cars. The real world has challenging conditions for multiagent learning systems, such as its partial observable and nonstationary nature. Moreover, if agents must share a limited resource (e.g. network bandwidth) they must all learn how to coordinate resource use. This paper introduces a deep recurrent multiagent actor-critic framework (R-MADDPG) for handling multiagent coordination under partial observable set-tings and limited communication. We investigate recurrency effects on performance and communication use of a team of agents. We demonstrate that the resulting framework learns time dependencies for sharing missing observations, handling resource limitations, and developing different communication patterns among agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题