论文标题
通过熵登记的策略近似学习零和随机游戏中的纳什平衡
Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation
论文作者
论文摘要
我们探讨了策略近似值以减少零和随机游戏中学习NASH均衡的计算成本。我们提出了一种新的Q学习类型算法,该算法使用一系列熵登记的软策略来近似Q功能更新中的NASH策略。我们证明,在某些条件下,通过更新正则化Q功能,该算法会收敛到NASH平衡。我们还展示了提出的算法转移以前的培训经验的能力,使代理商能够迅速适应新的环境。我们提供动态的超参数调度方案,以进一步加快收敛速度。经验结果适用于许多随机游戏,验证了所提出的算法是否会收敛到NASH平衡,同时对现有算法表现出很大的速度。
We explore the use of policy approximations to reduce the computational cost of learning Nash equilibria in zero-sum stochastic games. We propose a new Q-learning type algorithm that uses a sequence of entropy-regularized soft policies to approximate the Nash policy during the Q-function updates. We prove that under certain conditions, by updating the regularized Q-function, the algorithm converges to a Nash equilibrium. We also demonstrate the proposed algorithm's ability to transfer previous training experiences, enabling the agents to adapt quickly to new environments. We provide a dynamic hyper-parameter scheduling scheme to further expedite convergence. Empirical results applied to a number of stochastic games verify that the proposed algorithm converges to the Nash equilibrium, while exhibiting a major speed-up over existing algorithms.