通过熵登记的策略近似学习零和随机游戏中的纳什平衡

论文标题

通过熵登记的策略近似学习零和随机游戏中的纳什平衡

Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation

论文作者

Guan, Yue, Zhang, Qifan, Tsiotras, Panagiotis

论文摘要

我们探讨了策略近似值以减少零和随机游戏中学习NASH均衡的计算成本。我们提出了一种新的Q学习类型算法，该算法使用一系列熵登记的软策略来近似Q功能更新中的NASH策略。我们证明，在某些条件下，通过更新正则化Q功能，该算法会收敛到NASH平衡。我们还展示了提出的算法转移以前的培训经验的能力，使代理商能够迅速适应新的环境。我们提供动态的超参数调度方案，以进一步加快收敛速度。经验结果适用于许多随机游戏，验证了所提出的算法是否会收敛到NASH平衡，同时对现有算法表现出很大的速度。

We explore the use of policy approximations to reduce the computational cost of learning Nash equilibria in zero-sum stochastic games. We propose a new Q-learning type algorithm that uses a sequence of entropy-regularized soft policies to approximate the Nash policy during the Q-function updates. We prove that under certain conditions, by updating the regularized Q-function, the algorithm converges to a Nash equilibrium. We also demonstrate the proposed algorithm's ability to transfer previous training experiences, enabling the agents to adapt quickly to new environments. We provide a dynamic hyper-parameter scheduling scheme to further expedite convergence. Empirical results applied to a number of stochastic games verify that the proposed algorithm converges to the Nash equilibrium, while exhibiting a major speed-up over existing algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题