QTRAN ++：改善合作多代理增强学习的价值转换

论文标题

QTRAN ++：改善合作多代理增强学习的价值转换

QTRAN++: Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning

论文作者

Son, Kyunghwan, Ahn, Sungsoo, Reyes, Roben Delos, Shin, Jinwoo, Yi, Yung

论文摘要

QTRAN是一种多代理增强学习（MARL）算法，能够最新学习最大的联合行动价值功能。然而，尽管具有强大的理论保证，但在复杂环境中的经验表现不佳，例如星际争霸多代理挑战（SMAC）。在本文中，我们确定了QTRAN的性能瓶颈，并提出了一个大大改进的版本，即创建的Qtran ++。我们的收益来自（i）稳定QTRAN的训练目标，（ii）删除QTRAN的动作值估计器之间的严格角色分离，以及（iii）引入多头混合网络以进行价值转换。通过广泛的评估，我们确认我们的诊断是正确的，QTRAN ++成功地弥合了经验绩效和理论保证之间的差距。特别是，Qtran ++在SMAC环境中新实现最先进的性能。代码将发布。

QTRAN is a multi-agent reinforcement learning (MARL) algorithm capable of learning the largest class of joint-action value functions up to date. However, despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments, such as Starcraft Multi-Agent Challenge (SMAC). In this paper, we identify the performance bottleneck of QTRAN and propose a substantially improved version, coined QTRAN++. Our gains come from (i) stabilizing the training objective of QTRAN, (ii) removing the strict role separation between the action-value estimators of QTRAN, and (iii) introducing a multi-head mixing network for value transformation. Through extensive evaluation, we confirm that our diagnosis is correct, and QTRAN++ successfully bridges the gap between empirical performance and theoretical guarantee. In particular, QTRAN++ newly achieves state-of-the-art performance in the SMAC environment. The code will be released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题