论文标题

生成和适应哈纳比的各种临时合作代理商

Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi

论文作者

Canaan, Rodrigo, Gao, Xianbo, Togelius, Julian, Nealen, Andy, Menzel, Stefan

论文摘要

哈纳比(Hanabi)是一款合作游戏,它带来了将其他玩家建模到最前沿的问题。在这个游戏中,协调的一组玩家可以利用预先建立的惯例发挥出色的效果,但是在临时环境中进行比赛需要代理商适应其伙伴的策略,而没有以前的协调。在这种情况下评估代理需要各种各样的潜在伙伴人群,但是到目前为止,尚未以系统的方式考虑代理的行为多样性。本文提出,质量多样性算法是为此目的产生不同种群的有前途的算法类别,并使用MAP-ELITE产生了不同的Hanabi代理。我们还假设,在培训期间,代理人可以从多样化的人群中受益,并实施一个简单的“元策略”,以适应代理人的行为态度。我们表明,即使可以正确推断其伴侣的行为位置,即使培训其伴侣的行为位,也可以培训这种元策略的效果比通才策略更好,但是在实践中,伴侣的行为取决于并干扰了元代理自己的行为,这暗示了在游戏玩法中表征另一个代理人行为的未来研究的途径。

Hanabi is a cooperative game that brings the problem of modeling other players to the forefront. In this game, coordinated groups of players can leverage pre-established conventions to great effect, but playing in an ad-hoc setting requires agents to adapt to its partner's strategies with no previous coordination. Evaluating an agent in this setting requires a diverse population of potential partners, but so far, the behavioral diversity of agents has not been considered in a systematic way. This paper proposes Quality Diversity algorithms as a promising class of algorithms to generate diverse populations for this purpose, and generates a population of diverse Hanabi agents using MAP-Elites. We also postulate that agents can benefit from a diverse population during training and implement a simple "meta-strategy" for adapting to an agent's perceived behavioral niche. We show this meta-strategy can work better than generalist strategies even outside the population it was trained with if its partner's behavioral niche can be correctly inferred, but in practice a partner's behavior depends and interferes with the meta-agent's own behavior, suggesting an avenue for future research in characterizing another agent's behavior during gameplay.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源