通过指数反事实遗憾最小化解决不完美的信息游戏

论文标题

通过指数反事实遗憾最小化解决不完美的信息游戏

Solving imperfect-information games via exponential counterfactual regret minimization

论文作者

Li, Huale, Wang, Xuan, Qi, Shuhan, Zhang, Jiajia, Liu, Yang, Wu, Yulin, Jia, Fengwei

论文摘要

通常，可以将两项代理决策问题建模为两人游戏，而典型的解决方案是在此类游戏中找到NASH平衡。反事实后悔最小化（CFR）是一种众所周知的方法，可以在具有不完美信息的两人零和游戏中找到NASH平衡策略。 CFR方法采用遗憾匹配算法迭代，以逐步降低后悔值，从而使平均策略达到NASH平衡。尽管基于CFR的方法在不完美的信息游戏领域取得了重大成功，但仍有提高收敛效率的范围。为了应对这一挑战，我们提出了一种基于CFR的新型方法，名为指数反事实遗憾最小化（ECFR）。使用ECFR，使用指数加权技术来重新续签迭代过程中的瞬时后悔价值。提供理论证明以确保ECFR算法的收敛性。一组广泛的实验测试的结果表明，ECFR算法的收敛速度比当前基于CFR的当前方法更快。

In general, two-agent decision-making problems can be modeled as a two-player game, and a typical solution is to find a Nash equilibrium in such game. Counterfactual regret minimization (CFR) is a well-known method to find a Nash equilibrium strategy in a two-player zero-sum game with imperfect information. The CFR method adopts a regret matching algorithm iteratively to reduce regret values progressively, enabling the average strategy to approach a Nash equilibrium. Although CFR-based methods have achieved significant success in the field of imperfect information games, there is still scope for improvement in the efficiency of convergence. To address this challenge, we propose a novel CFR-based method named exponential counterfactual regret minimization (ECFR). With ECFR, an exponential weighting technique is used to reweight the instantaneous regret value during the process of iteration. A theoretical proof is provided to guarantees convergence of the ECFR algorithm. The result of an extensive set of experimental tests demostrate that the ECFR algorithm converges faster than the current state-of-the-art CFR-based methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题