学习玩顺序游戏与未知对手

论文标题

学习玩顺序游戏与未知对手

Learning to Play Sequential Games versus Unknown Opponents

论文作者

Sessa, Pier Giuseppe, Bogunovic, Ilija, Kamgarpour, Maryam, Krause, Andreas

论文摘要

我们考虑了首先扮演的学习者和对所选动作做出反应的对手之间的重复顺序游戏。我们试图为学习者设计策略，以成功与对手互动。尽管大多数以前的方法都考虑已知的对手模型，但我们专注于对手模型未知的设置。为此，我们使用基于内核的规则性假设来捕获和利用对手反应中的结构。我们在对抗对手的对抗序列时为学习者提出了一种新颖的算法。该算法结合了来自双层优化和在线学习的想法，以有效平衡探索（学习对手的模型）和剥削（为学习者选择高度有意义的行动）。我们的结果包括算法的遗憾保证，可以取决于对手的反应的规律性，并在比赛回合的数量上进行统一。此外，我们专注于重复的Stackelberg游戏的方法，并在经验上证明了其在交通路线和野生动植物保护任务中的有效性

We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We seek to design strategies for the learner to successfully interact with the opponent. While most previous approaches consider known opponent models, we focus on the setting in which the opponent's model is unknown. To this end, we use kernel-based regularity assumptions to capture and exploit the structure in the opponent's response. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. The algorithm combines ideas from bilevel optimization and online learning to effectively balance between exploration (learning about the opponent's model) and exploitation (selecting highly rewarding actions for the learner). Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response and scale sublinearly with the number of game rounds. Moreover, we specialize our approach to repeated Stackelberg games, and empirically demonstrate its effectiveness in a traffic routing and wildlife conservation task

下载PDF全文

下载文献需遵守相关版权规定

论文标题