Agent57：优于Atari人类基准

论文标题

Agent57：优于Atari人类基准

Agent57: Outperforming the Atari Human Benchmark

论文作者

Badia, Adrià Puigdomènech, Piot, Bilal, Kapturowski, Steven, Sprechmann, Pablo, Vitvitskyi, Alex, Guo, Daniel, Blundell, Charles

论文摘要

在过去的十年中，Atari Games一直是强化学习（RL）社区的长期基准。提出了该基准测试RL算法的一般能力。以前的工作通过在场景的许多游戏中表现出色，但在几场最具挑战性的游戏中表现差，取得了良好的平均表现。我们提出了Agent57，这是第一个在所有57场Atari游戏中优于标准人类基准的深度RL代理。为了实现这一结果，我们训练一个神经网络，该神经网络参数化了从非常探索性到纯粹剥削性的一系列政策。我们提出了一种自适应机制，以选择在整个培训过程中优先考虑哪种政策。此外，我们还利用了架构的新参数化，从而可以进行更一致和稳定的学习。

Atari games have been a long-standing benchmark in the reinforcement learning (RL) community for the past decade. This benchmark was proposed to test general competency of RL algorithms. Previous work has achieved good average performance by doing outstandingly well on many games of the set, but very poorly in several of the most challenging games. We propose Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games. To achieve this result, we train a neural network which parameterizes a family of policies ranging from very exploratory to purely exploitative. We propose an adaptive mechanism to choose which policy to prioritize throughout the training process. Additionally, we utilize a novel parameterization of the architecture that allows for more consistent and stable learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题