评估一般视频游戏中的概括

论文标题

评估一般视频游戏中的概括

Evaluating Generalisation in General Video Game Playing

论文作者

Balla, Martin, Lucas, Simon M., Perez-Liebana, Diego

论文摘要

一般的视频游戏人工智能（GVGAI）竞赛已经进行了几年的各种曲目。本文重点介绍了GVGAI学习轨道的挑战，其中选择了3场游戏，并给出了2个级别的培训，而3个隐藏级别用于评估。对于当前的增强学习（RL）算法，这种设置构成了艰巨的挑战，因为它们通常需要更多数据。这项工作研究了3个版本的Advantage Actor-Critic（A2C）算法，该算法在GVGGAI框架中从可用的5层进行了2个级别的训练，并在所有级别上进行了比较。选定的游戏子集具有不同的特征，例如随机性，奖励分布和目标。我们发现随机性改善了概括，但是太多会导致算法无法学习训练水平。培训水平的质量也很重要，不同的培训水平可以提高所有级别的概括。在GVGAI中，竞争者是根据他们的获胜率，然后在比赛中取得的得分来评分。我们发现，仅使用游戏提供的奖励可能不会鼓励获胜。

The General Video Game Artificial Intelligence (GVGAI) competition has been running for several years with various tracks. This paper focuses on the challenge of the GVGAI learning track in which 3 games are selected and 2 levels are given for training, while 3 hidden levels are left for evaluation. This setup poses a difficult challenge for current Reinforcement Learning (RL) algorithms, as they typically require much more data. This work investigates 3 versions of the Advantage Actor-Critic (A2C) algorithm trained on a maximum of 2 levels from the available 5 from the GVGAI framework and compares their performance on all levels. The selected sub-set of games have different characteristics, like stochasticity, reward distribution and objectives. We found that stochasticity improves the generalisation, but too much can cause the algorithms to fail to learn the training levels. The quality of the training levels also matters, different sets of training levels can boost generalisation over all levels. In the GVGAI competition agents are scored based on their win rates and then their scores achieved in the games. We found that solely using the rewards provided by the game might not encourage winning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题