论文标题
在大型完美信息游戏中求解Stackelberg平衡的功能近似
Function Approximation for Solving Stackelberg Equilibrium in Large Perfect Information Games
论文作者
论文摘要
功能近似(FA)一直是解决大型零和游戏的关键组成部分。然而,在解决\ textit {General-sum}广泛形式的游戏方面,很少有人关注FA,尽管它们被广泛认为是计算上比其完全竞争或合作的对应物更具挑战性的。一个关键的挑战是,对于一般游戏中的许多平衡,都不存在与马尔可夫决策过程中使用的状态价值函数和零和零游戏游戏中使用的简单类似。在本文中,我们建议学习\ textIt {可执行的回报边界}(EPF) - 通用游戏的状态值函数的概括。我们通过用神经网络表示EPF,并使用适当的备份操作和损失功能来训练它们,从而近似最佳\ textit {stackelberg广泛形式相关的平衡}。这是将FA应用于Stackelberg设置的第一种方法,使我们能够扩展到更大的游戏,同时仍然根据FA错误享受性能保证。此外,我们提出的方法保证了激励兼容性,并且易于评估,而不必依赖自我播放或近似最佳响应作者。
Function approximation (FA) has been a critical component in solving large zero-sum games. Yet, little attention has been given towards FA in solving \textit{general-sum} extensive-form games, despite them being widely regarded as being computationally more challenging than their fully competitive or cooperative counterparts. A key challenge is that for many equilibria in general-sum games, no simple analogue to the state value function used in Markov Decision Processes and zero-sum games exists. In this paper, we propose learning the \textit{Enforceable Payoff Frontier} (EPF) -- a generalization of the state value function for general-sum games. We approximate the optimal \textit{Stackelberg extensive-form correlated equilibrium} by representing EPFs with neural networks and training them by using appropriate backup operations and loss functions. This is the first method that applies FA to the Stackelberg setting, allowing us to scale to much larger games while still enjoying performance guarantees based on FA error. Additionally, our proposed method guarantees incentive compatibility and is easy to evaluate without having to depend on self-play or approximate best-response oracles.