同质马尔可夫游戏的沟通效率高效的参与者批评方法

论文标题

同质马尔可夫游戏的沟通效率高效的参与者批评方法

Communication-Efficient Actor-Critic Methods for Homogeneous Markov Games

论文作者

Chen, Dingyang, Li, Yile, Zhang, Qi

论文摘要

合作多机构增强学习（MARL）的最新成功依赖于集中培训和政策共享。集中培训消除了非平稳性MARL的问题，但引起了巨大的沟通成本，而政策共享在某些任务中有效学习至关重要，但缺乏理论上的理由。在本文中，我们正式描述了合作马尔可夫游戏的子类，在该类别中，代理商表现出某种形式的同质性，因此共享的政策没有遭受次优。这使我们能够开发第一个基于共识的分散式参与者 - 批评方法，其中在确保收敛的同时，将共识更新应用于参与者和批评家。我们还根据分散的参与者评价方法来开发实用算法，以降低培训期间的沟通成本，同时仍产生与集中式培训相当的政策。

Recent success in cooperative multi-agent reinforcement learning (MARL) relies on centralized training and policy sharing. Centralized training eliminates the issue of non-stationarity MARL yet induces large communication costs, and policy sharing is empirically crucial to efficient learning in certain tasks yet lacks theoretical justification. In this paper, we formally characterize a subclass of cooperative Markov games where agents exhibit a certain form of homogeneity such that policy sharing provably incurs no suboptimality. This enables us to develop the first consensus-based decentralized actor-critic method where the consensus update is applied to both the actors and the critics while ensuring convergence. We also develop practical algorithms based on our decentralized actor-critic method to reduce the communication cost during training, while still yielding policies comparable with centralized training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题