生成和适应哈纳比的各种临时合作代理商

论文标题

生成和适应哈纳比的各种临时合作代理商

Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi

论文作者

Canaan, Rodrigo, Gao, Xianbo, Togelius, Julian, Nealen, Andy, Menzel, Stefan

论文摘要

哈纳比（Hanabi）是一款合作游戏，它带来了将其他玩家建模到最前沿的问题。在这个游戏中，协调的一组玩家可以利用预先建立的惯例发挥出色的效果，但是在临时环境中进行比赛需要代理商适应其伙伴的策略，而没有以前的协调。在这种情况下评估代理需要各种各样的潜在伙伴人群，但是到目前为止，尚未以系统的方式考虑代理的行为多样性。本文提出，质量多样性算法是为此目的产生不同种群的有前途的算法类别，并使用MAP-ELITE产生了不同的Hanabi代理。我们还假设，在培训期间，代理人可以从多样化的人群中受益，并实施一个简单的“元策略”，以适应代理人的行为态度。我们表明，即使可以正确推断其伴侣的行为位置，即使培训其伴侣的行为位，也可以培训这种元策略的效果比通才策略更好，但是在实践中，伴侣的行为取决于并干扰了元代理自己的行为，这暗示了在游戏玩法中表征另一个代理人行为的未来研究的途径。

Hanabi is a cooperative game that brings the problem of modeling other players to the forefront. In this game, coordinated groups of players can leverage pre-established conventions to great effect, but playing in an ad-hoc setting requires agents to adapt to its partner's strategies with no previous coordination. Evaluating an agent in this setting requires a diverse population of potential partners, but so far, the behavioral diversity of agents has not been considered in a systematic way. This paper proposes Quality Diversity algorithms as a promising class of algorithms to generate diverse populations for this purpose, and generates a population of diverse Hanabi agents using MAP-Elites. We also postulate that agents can benefit from a diverse population during training and implement a simple "meta-strategy" for adapting to an agent's perceived behavioral niche. We show this meta-strategy can work better than generalist strategies even outside the population it was trained with if its partner's behavioral niche can be correctly inferred, but in practice a partner's behavior depends and interferes with the meta-agent's own behavior, suggesting an avenue for future research in characterizing another agent's behavior during gameplay.

下载PDF全文

下载文献需遵守相关版权规定

论文标题