用离散的世界模型掌握atari

论文标题

用离散的世界模型掌握atari

Mastering Atari with Discrete World Models

论文作者

Hafner, Danijar, Lillicrap, Timothy, Norouzi, Mohammad, Ba, Jimmy

论文摘要

智能代理需要从过去的经验中概括，以在复杂的环境中实现目标。世界模型促进了这种概括，并允许从想象中的结果中学习行为，以提高样本效率。尽管最近从图像输入中学习世界模型对于某些任务来说是可行的，但对Atari游戏进行了足够准确的建模以得出成功的行为，这一直是多年的开放挑战。我们介绍了Dreamerv2，这是一种强化学习者，它纯粹是从强大世界模型的紧凑型潜在空间中的预测中学习的。世界模型使用离散表示形式，并与政策分开培训。 Dreamerv2构成了第一个通过在单独训练的世界模型中学习行为在55个任务的Atari基准上实现人类水平表现的第一个代理。凭借相同的计算预算和墙壁锁定时间，Dreamer V2达到了200m帧，超过了顶级单GPU代理IQN和Rainbow的最终性能。 Dreamerv2还适用于具有连续动作的任务，在该任务中，它可以学习复杂的人形机器人机器人的准确世界模型，并解决了站立并仅从像素输入中步行。

Intelligent agents need to generalize from past experience to achieve goals in complex environments. World models facilitate such generalization and allow learning behaviors from imagined outcomes to increase sample-efficiency. While learning world models from image inputs has recently become feasible for some tasks, modeling Atari games accurately enough to derive successful behaviors has remained an open challenge for many years. We introduce DreamerV2, a reinforcement learning agent that learns behaviors purely from predictions in the compact latent space of a powerful world model. The world model uses discrete representations and is trained separately from the policy. DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model. With the same computational budget and wall-clock time, Dreamer V2 reaches 200M frames and surpasses the final performance of the top single-GPU agents IQN and Rainbow. DreamerV2 is also applicable to tasks with continuous actions, where it learns an accurate world model of a complex humanoid robot and solves stand-up and walking from only pixel inputs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题