十六进制和神经动力学编程

论文标题

十六进制和神经动力学编程

HEX and Neurodynamic Programming

论文作者

Banerjee, Debangshu

论文摘要

十六进制是一个复杂的游戏，具有很高的分支因素。在不使用游戏树结构和相关的修剪方法的情况下，第一次尝试解决HEX。我们还放弃了有关虚拟连接或半虚拟连接的任何启发式信息，这些信息先前在游戏的所有已知计算机版本中都使用。 H-Search算法是找到此类连接并在以前的HEX演奏代理中成功使用的基础。取而代之的是，我们使用的是通过自我游戏和通过神经网络进行近似学习的强化学习，以通过高分支因素的问题并维护大型表进行国家行动评估。我们的代码主要基于Neurohex。灵感来自阿尔法戈（Alphago）零最近的成功。

Hex is a complex game with a high branching factor. For the first time Hex is being attempted to be solved without the use of game tree structures and associated methods of pruning. We also are abstaining from any heuristic information about Virtual Connections or Semi Virtual Connections which were previously used in all previous known computer versions of the game. The H-search algorithm which was the basis of finding such connections and had been used with success in previous Hex playing agents has been forgone. Instead what we use is reinforcement learning through self play and approximations through neural networks to by pass the problem of high branching factor and maintaining large tables for state-action evaluations. Our code is based primarily on NeuroHex. The inspiration is drawn from the recent success of AlphaGo Zero.

下载PDF全文

下载文献需遵守相关版权规定

论文标题