论文标题

在没有重新格雷的反复游戏中的平衡和动态基准测试

Equilibria in Repeated Games under No-Regret with Dynamic Benchmarks

论文作者

Crippa, Ludovico, Gur, Yonatan, Light, Bar

论文摘要

在重复的游戏中,通常通过保证事后选择的单一最佳动作的性能,被称为\ emph {Hannan Enensency}或\ emph {no-regret}的属性来评估策略。但是,单一最佳动作作为评估策略的码数的有效性是有限的,因为任何静态动作在常见的动态环境中的表现都可能差。因此,我们的工作转向了\ emph {动态基准一致性}的更雄心勃勃的概念,该概念保证了最佳的\ emph {dynamic}动作序列的性能,在事后选择的动作序列,对允许数量的动作更改构成约束。我们的主要结果表明,对于所有参与者都采用无重组策略时,可能会出现的任何共同的经验分配,存在动态基准测试一致的策略,以便如果所有参与者都部署这些策略,那么当地平线足够大时就会出现相同的经验分布。该结果表明,尽管动态基准一致的策略具有不同的算法结构并提供了显着增强的个人保证,但它们导致与无regreg策略相同的平衡集。此外,我们主要结果的证明揭示了具有强大个人保证的独立算法的能力,可以促进强大的协调形式。

In repeated games, strategies are often evaluated by their ability to guarantee the performance of the single best action that is selected in hindsight, a property referred to as \emph{Hannan consistency}, or \emph{no-regret}. However, the effectiveness of the single best action as a yardstick to evaluate strategies is limited, as any static action may perform poorly in common dynamic settings. Our work therefore turns to a more ambitious notion of \emph{dynamic benchmark consistency}, which guarantees the performance of the best \emph{dynamic} sequence of actions, selected in hindsight subject to a constraint on the allowable number of action changes. Our main result establishes that for any joint empirical distribution of play that may arise when all players deploy no-regret strategies, there exist dynamic benchmark consistent strategies such that if all players deploy these strategies the same empirical distribution emerges when the horizon is large enough. This result demonstrates that although dynamic benchmark consistent strategies have a different algorithmic structure and provide significantly enhanced individual assurances, they lead to the same equilibrium set as no-regret strategies. Moreover, the proof of our main result uncovers the capacity of independent algorithms with strong individual guarantees to foster a strong form of coordination.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源