有效探索零和随机游戏

论文标题

有效探索零和随机游戏

Efficient exploration of zero-sum stochastic games

论文作者

Martin, Carlos, Sandholm, Tuomas

论文摘要

我们调查了日益重要且常见的游戏环境环境，我们没有对游戏的明确描述，而只能通过游戏玩法（例如在金融或军事模拟和计算机游戏中）访问它。在有限的学习阶段，该算法可以控制两个玩家的动作，以尝试学习游戏以及如何玩得很好。之后，该算法必须产生一种具有较低利用性的策略。我们的动机是快速学习在评估查询策略概况的回报的情况下具有较低可利用性的策略。对于随机游戏设置，我们建议使用在可能环境上信仰分布引起的状态行动价值函数的分布。我们将各种探索策略的性能与此任务进行了比较，包括汤普森采样和贝叶斯-UCB的概括与这种新环境。这两个始终超过其他策略。

We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay, such as in financial or military simulations and computer games. During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well. After that, the algorithm has to produce a strategy that has low exploitability. Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly. For the stochastic game setting, we propose using the distribution of state-action value functions induced by a belief distribution over possible environments. We compare the performance of various exploration strategies for this task, including generalizations of Thompson sampling and Bayes-UCB to this new setting. These two consistently outperform other strategies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题