论文标题
学习两人混合物马尔可夫游戏:内核功能近似和相关平衡
Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium
论文作者
论文摘要
我们考虑在具有非线性函数近似的两人零和马尔可夫游戏中学习NASH均衡,其中动作值函数通过重现的内核Hilbert Space(RKHS)中的函数近似。关键挑战是如何在高维函数空间中进行探索。我们提出了一种新颖的在线学习算法,以最大程度地减少双重性差距来找到NASH平衡。我们算法的核心是基于不确定性的乐观原理得出的上和下置信度界限。我们证明,在非常温和的假设上,我们的算法能够获得$ o(\ sqrt {t})$遗憾,这是对奖励功能的非常温和的假设和马尔可夫游戏的基本动态。我们还提出了几个算法的扩展,包括具有伯恩斯坦型奖励的算法,可以实现更严格的遗憾,以及用于模型错误指定的另一种算法,可以应用于神经功能近似。
We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation, where the action-value function is approximated by a function in a Reproducing Kernel Hilbert Space (RKHS). The key challenge is how to do exploration in the high-dimensional function space. We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality gap. At the core of our algorithms are upper and lower confidence bounds that are derived based on the principle of optimism in the face of uncertainty. We prove that our algorithm is able to attain an $O(\sqrt{T})$ regret with polynomial computational complexity, under very mild assumptions on the reward function and the underlying dynamic of the Markov Games. We also propose several extensions of our algorithm, including an algorithm with Bernstein-type bonus that can achieve a tighter regret bound, and another algorithm for model misspecification that can be applied to neural function approximation.