不平衡：多代理强化学习的普遍价值探索

论文标题

不平衡：多代理强化学习的普遍价值探索

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

论文作者

Gupta, Tarun, Mahajan, Anuj, Peng, Bei, Böhmer, Wendelin, Whiteson, Shimon

论文摘要

VDN和QMIX是两个流行的基于价值的算法，用于合作MARL，它们将集中的动作值函数作为单调的每个机构实用程序的单调混合。尽管这可以轻松地分散学习政策，但受限制的联合行动价值函数可以阻止他们解决在给定时间段上需要显着协调的任务。我们表明，可以通过改善培训期间所有代理商的共同探索来克服这个问题。具体而言，我们提出了一种称为通用价值探索（不均匀）的新型MARL方法，该方法与通用后继功能的线性分解同时学习了一组相关任务。有了已经解决的相关任务的政策，可以改善所有代理的联合勘探过程，以帮助他们获得更好的协调。一系列勘探游戏的经验结果，挑战了需要代理商进行重大协调的合作捕食者捕食任务，而Starcraft II微管理基准表明，在其他最先进的MARL方法失败的情况下，不平衡可以解决不均匀的任务。

VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities. While this enables easy decentralization of the learned policy, the restricted joint action value function can prevent them from solving tasks that require significant coordination between agents at a given timestep. We show that this problem can be overcome by improving the joint exploration of all agents during training. Specifically, we propose a novel MARL approach called Universal Value Exploration (UneVEn) that learns a set of related tasks simultaneously with a linear decomposition of universal successor features. With the policies of already solved related tasks, the joint exploration process of all agents can be improved to help them achieve better coordination. Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.

下载PDF全文

下载文献需遵守相关版权规定

论文标题