脱机加固从图像中学习潜在空间模型

论文标题

脱机加固从图像中学习潜在空间模型

Offline Reinforcement Learning from Images with Latent Space Models

论文作者

Rafailov, Rafael, Yu, Tianhe, Rajeswaran, Aravind, Finn, Chelsea

论文摘要

离线增强学习（RL）是指从环境相互作用的静态数据集中学习政策的问题。离线RL可以广泛使用和重复使用历史数据集，同时还可以减轻与在线勘探相关的安全问题，从而扩大了RL的现实世界适用性。离线RL中的大多数先前工作都集中在具有紧凑状态表示的任务上。但是，直接从图像之类的丰富观测空间中学习的能力对于像机器人技术等现实世界应用至关重要。在这项工作中，我们基于基于模型的离线RL算法的最新进展，并将其扩展到高维视觉观察空间。基于模型的离线RL算法已在基于州的任务中实现了最先进的结果，并具有强大的理论保证。但是，它们至关重要地依赖于量化模型预测中不确定性的能力，这对图像观察尤其具有挑战性。为了克服这一挑战，我们建议学习一个潜在的状态动力学模型，并代表潜在空间中的不确定性。我们的方法在实践中既可以处理，又对应于最大化未知POMDP中Elbo的下限。在有关一系列具有挑战性的基于图像的运动和操纵任务的实验中，我们发现我们的算法显着优于先前的无线模型RL方法以及基于最新的在线视觉模型RL方法。此外，我们还发现，使用预先存在的数据集在实际机器人上的基于图像的抽屉结束任务上表现出色。所有结果在内的视频都可以在https://sites.google.com/view/lompo/在线找到。

Offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions. Offline RL enables extensive use and re-use of historical datasets, while also alleviating safety concerns associated with online exploration, thereby expanding the real-world applicability of RL. Most prior work in offline RL has focused on tasks with compact state representations. However, the ability to learn directly from rich observation spaces like images is critical for real-world applications such as robotics. In this work, we build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces. Model-based offline RL algorithms have achieved state of the art results in state based tasks and have strong theoretical guarantees. However, they rely crucially on the ability to quantify uncertainty in the model predictions, which is particularly challenging with image observations. To overcome this challenge, we propose to learn a latent-state dynamics model, and represent the uncertainty in the latent space. Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP. In experiments on a range of challenging image-based locomotion and manipulation tasks, we find that our algorithm significantly outperforms previous offline model-free RL methods as well as state-of-the-art online visual model-based RL methods. Moreover, we also find that our approach excels on an image-based drawer closing task on a real robot using a pre-existing dataset. All results including videos can be found online at https://sites.google.com/view/lompo/ .

下载PDF全文

下载文献需遵守相关版权规定

论文标题