论文标题
同质马尔可夫游戏的沟通效率高效的参与者批评方法
Communication-Efficient Actor-Critic Methods for Homogeneous Markov Games
论文作者
论文摘要
合作多机构增强学习(MARL)的最新成功依赖于集中培训和政策共享。集中培训消除了非平稳性MARL的问题,但引起了巨大的沟通成本,而政策共享在某些任务中有效学习至关重要,但缺乏理论上的理由。在本文中,我们正式描述了合作马尔可夫游戏的子类,在该类别中,代理商表现出某种形式的同质性,因此共享的政策没有遭受次优。这使我们能够开发第一个基于共识的分散式参与者 - 批评方法,其中在确保收敛的同时,将共识更新应用于参与者和批评家。我们还根据分散的参与者评价方法来开发实用算法,以降低培训期间的沟通成本,同时仍产生与集中式培训相当的政策。
Recent success in cooperative multi-agent reinforcement learning (MARL) relies on centralized training and policy sharing. Centralized training eliminates the issue of non-stationarity MARL yet induces large communication costs, and policy sharing is empirically crucial to efficient learning in certain tasks yet lacks theoretical justification. In this paper, we formally characterize a subclass of cooperative Markov games where agents exhibit a certain form of homogeneity such that policy sharing provably incurs no suboptimality. This enables us to develop the first consensus-based decentralized actor-critic method where the consensus update is applied to both the actors and the critics while ensuring convergence. We also develop practical algorithms based on our decentralized actor-critic method to reduce the communication cost during training, while still yielding policies comparable with centralized training.