论文标题

虚拟游戏在零和随机游戏中

Fictitious play in zero-sum stochastic games

论文作者

Sayin, Muhammed O., Parise, Francesca, Ozdaglar, Asuman

论文摘要

我们介绍了一种虚拟游戏动态的新颖变体,将古典虚拟游戏与随机游戏的Q学习结合在一起,并在两人零和零随机游戏中分析其收敛性。我们的动态涉及参与者对对手策略的信念及其自己的持续收益(Q功能),并通过使用估计的持续收益来发挥贪婪的最佳回应。玩家从对对手行动的观察中更新他们的信念。学习动力学的关键特性是,在Q-功能上的信念更新在时间范围内发生的时间比对策略的信念的更新更慢。我们在基于模型的情况和无模型案例(在不了解玩家回报功能和状态过渡概率的情况下)显示,策略的信念会融合到零和零随机游戏的固定混合NASH平衡。

We present a novel variant of fictitious play dynamics combining classical fictitious play with Q-learning for stochastic games and analyze its convergence properties in two-player zero-sum stochastic games. Our dynamics involves players forming beliefs on the opponent strategy and their own continuation payoff (Q-function), and playing a greedy best response by using the estimated continuation payoffs. Players update their beliefs from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源