迭代推理，并在合作和拜占庭分散的团队中使用共同信息

论文标题

迭代推理，并在合作和拜占庭分散的团队中使用共同信息

Iterated Reasoning with Mutual Information in Cooperative and Byzantine Decentralized Teaming

论文作者

Konan, Sachin, Seraj, Esmaeil, Gombolay, Matthew

论文摘要

信息共享是建立团队认知并实现协调与合作的关键。高性能的人类团队也从战略性地采用迭代沟通和合理性的层次结构来受益，这意味着人类代理可以推理队友在决策中的行动。然而，多代理强化学习（MARL）的大多数先前工作不支持迭代的理性性，而只能鼓励跨性别的沟通，从而实现了次优的平衡合作策略。在这项工作中，我们表明，在根据政策梯度（PG）进行优化时，将代理商的政策重新制定为有条件依靠其邻近队友的政策，从而最大程度地提高了相互信息（MI）。在有限的理性和认知层次结构理论下的决策思想的基础上，我们表明我们的修改后的PG方法不仅最大化了当地代理人的奖励，而且在不需要任何明确的临时正则化术语的情况下，对MI之间的MI奖励最大化。我们的方法Infopg在学习新兴的协作行为方面优于基准，并在分散的合作MARL任务中设定了最先进的方法。我们的实验通过在几个复杂的合作多代理域中实现较高的样本效率和更大的累积奖励来验证InfoPG的实用性。

Information sharing is key in building team cognition and enables coordination and cooperation. High-performing human teams also benefit from acting strategically with hierarchical levels of iterated communication and rationalizability, meaning a human agent can reason about the actions of their teammates in their decision-making. Yet, the majority of prior work in Multi-Agent Reinforcement Learning (MARL) does not support iterated rationalizability and only encourage inter-agent communication, resulting in a suboptimal equilibrium cooperation strategy. In this work, we show that reformulating an agent's policy to be conditional on the policies of its neighboring teammates inherently maximizes Mutual Information (MI) lower-bound when optimizing under Policy Gradient (PG). Building on the idea of decision-making under bounded rationality and cognitive hierarchy theory, we show that our modified PG approach not only maximizes local agent rewards but also implicitly reasons about MI between agents without the need for any explicit ad-hoc regularization terms. Our approach, InfoPG, outperforms baselines in learning emergent collaborative behaviors and sets the state-of-the-art in decentralized cooperative MARL tasks. Our experiments validate the utility of InfoPG by achieving higher sample efficiency and significantly larger cumulative reward in several complex cooperative multi-agent domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题