较小的增强学习模型

论文标题

较小的增强学习模型

Smaller World Models for Reinforcement Learning

论文作者

Robine, Jan, Uelwer, Tobias, Harmeling, Stefan

论文摘要

样本效率仍然是强化学习的基本问题。基于模型的算法尝试通过使用模型模拟环境来更好地利用数据。我们为世界模型提出了一种新的神经网络体系结构，以量化量化变量自动编码器（VQ-VAE）进行编码观测值和卷积LSTM，以预测下一个嵌入索引。无模型的PPO代理纯粹是根据世界模型的模拟体验训练的。我们采用Kaiser等人引入的设置。（2020），仅允许与真实环境进行100K交互。我们将方法应用于36个Atari环境，并表明我们的性能与它们的简单算法相当，而我们的模型则大大较小。

Sample efficiency remains a fundamental issue of reinforcement learning. Model-based algorithms try to make better use of data by simulating the environment with a model. We propose a new neural network architecture for world models based on a vector quantized-variational autoencoder (VQ-VAE) to encode observations and a convolutional LSTM to predict the next embedding indices. A model-free PPO agent is trained purely on simulated experience from the world model. We adopt the setup introduced by Kaiser et al. (2020), which only allows 100K interactions with the real environment. We apply our method on 36 Atari environments and show that we reach comparable performance to their SimPLe algorithm, while our model is significantly smaller.

下载PDF全文

下载文献需遵守相关版权规定

论文标题