Bootstrap潜在预测性表示多任务加固学习

论文标题

Bootstrap潜在预测性表示多任务加固学习

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

论文作者

Guo, Daniel, Pires, Bernardo Avila, Piot, Bilal, Grill, Jean-bastien, Altché, Florent, Munos, Rémi, Azar, Mohammad Gheshlaghi

论文摘要

学习良好的表示是深度加固学习（RL）的重要组成部分。表示学习在多任务和部分可观察到的设置中尤为重要，在该设置中，构建未知环境的表示对于解决任务至关重要。在这里，我们介绍了Bootstrap Letents（PBL）的预测，这是一种简单而灵活的自我监督的表示算法，用于多任务深度RL。 PBL建立在未来观察结果的多步预测表示的基础上，并着重于捕获有关环境动态的结构化信息。具体而言，PBL通过预测未来观察结果的潜在嵌入来训练其表示形式。这些潜在的嵌入本身受过训练，可以预测上述表示形式。这些预测构成了自举效果，使代理商可以更多地了解环境动态的关键方面。此外，通过在潜在空间中完全定义预测任务，PBL提供了使用涉及像素图像，语言说明，奖励等的多模式观测值的灵活性。我们在实验中表明，PBL在DMLAB-30和ATARI-57多任务设置中的最先进的RL代理中提供了全面的板上的性能。

Learning a good representation is an essential component for deep reinforcement learning (RL). Representation learning is especially important in multitask and partially observable settings where building a representation of the unknown environment is crucial to solve the tasks. Here we introduce Prediction of Bootstrap Latents (PBL), a simple and flexible self-supervised representation learning algorithm for multitask deep RL. PBL builds on multistep predictive representations of future observations, and focuses on capturing structured information about environment dynamics. Specifically, PBL trains its representation by predicting latent embeddings of future observations. These latent embeddings are themselves trained to be predictive of the aforementioned representations. These predictions form a bootstrapping effect, allowing the agent to learn more about the key aspects of the environment dynamics. In addition, by defining prediction tasks completely in latent space, PBL provides the flexibility of using multimodal observations involving pixel images, language instructions, rewards and more. We show in our experiments that PBL delivers across-the-board improved performance over state of the art deep RL agents in the DMLab-30 and Atari-57 multitask setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题