代理时暂时的关注，以进行情节多代理增强学习中的奖励重新分配

论文标题

代理时暂时的关注，以进行情节多代理增强学习中的奖励重新分配

Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning

论文作者

Xiao, Baicen, Ramasubramanian, Bhaskar, Poovendran, Radha

论文摘要

本文考虑了多项式增强学习（MARL）任务，在情节结束时，代理商获得了共同的全球奖励。这种奖励的延迟性质会影响代理人评估其在中间时间阶段的行动质量的能力。本文着重于开发学习情节奖励的时间重新分布的方法，以获得密集的奖励信号。解决此类MARL问题需要解决两个挑战：确定（1）状态沿着发作长度（沿时间）的相对重要性，以及（2）单个时间步长（在代理中）中各个代理状态的相对重要性。在本文中，我们引入了代理 - 时空的关注，以奖励情节多代理增强学习（AREL）的奖励再分配，以应对这两个挑战。 Arel使用注意机制来表征作用对沿轨迹的状态转变（时间注意）的影响，以及每个时间步长如何影响其他代理的影响（代理注意）。 Arel预测的重新分配奖励是密集的，并且可以与任何给定的MARL算法集成。我们评估Arel从粒子世界环境和星际争霸多代理挑战中的挑战任务中评估。与三种最先进的奖励再分配方法相比，AREL在粒子世界中带来了更高的奖励，并提高了星际争霸的获胜率。我们的代码可从https://github.com/baicenxiao/arel获得。

This paper considers multi-agent reinforcement learning (MARL) tasks where agents receive a shared global reward at the end of an episode. The delayed nature of this reward affects the ability of the agents to assess the quality of their actions at intermediate time-steps. This paper focuses on developing methods to learn a temporal redistribution of the episodic reward to obtain a dense reward signal. Solving such MARL problems requires addressing two challenges: identifying (1) relative importance of states along the length of an episode (along time), and (2) relative importance of individual agents' states at any single time-step (among agents). In this paper, we introduce Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning (AREL) to address these two challenges. AREL uses attention mechanisms to characterize the influence of actions on state transitions along trajectories (temporal attention), and how each agent is affected by other agents at each time-step (agent attention). The redistributed rewards predicted by AREL are dense, and can be integrated with any given MARL algorithm. We evaluate AREL on challenging tasks from the Particle World environment and the StarCraft Multi-Agent Challenge. AREL results in higher rewards in Particle World, and improved win rates in StarCraft compared to three state-of-the-art reward redistribution methods. Our code is available at https://github.com/baicenxiao/AREL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题