论文标题
RL-Cyclegan:增强学习意识模拟到现实
RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real
论文作者
论文摘要
深度神经网络的强化学习(RL)可以为复杂的任务学习适当的视觉表示,例如基于视觉的机器人握把,而无需手动工程或事先学习感知系统。但是,RL的数据是通过在所需的环境中运行代理来收集的,对于机器人技术等应用程序,在现实世界中运行机器人可能非常昂贵且耗时。模拟培训提供了一种吸引人的替代方法,但是确保在模拟中培训的政策可以有效地转移到现实世界中,需要额外的机械。模拟可能不匹配现实,通常桥接模拟到现实差距需要领域知识和特定于任务的工程。我们可以通过使用生成模型将模拟图像转换为逼真的图像来自动化此过程。但是,这种翻译通常是任务不合时宜的,因为翻译的图像可能无法保留与任务相关的所有功能。在本文中,我们介绍了图像翻译的RL-Scene一致性损失,该损失可确保翻译操作相对于与图像相关的Q值是不变的。这使我们能够学习任务感知的翻译。将这种损失纳入无监督的域翻译中,我们获得了RL-Cyclegan,这是一种用于增强学习的新方法,用于仿真到现实世界。在对两个基于视觉的机器人掌握任务的RL-Cyclegan的评估中,我们表明RL-Cyclegan对许多先前的SIM到现实传输的先前方法提供了实质性的改进,仅实现了出色的现实性能,只有一定数量的现实观察。
Deep neural network based reinforcement learning (RL) can learn appropriate visual representations for complex tasks like vision-based robotic grasping without the need for manually engineering or prior learning a perception system. However, data for RL is collected via running an agent in the desired environment, and for applications like robotics, running a robot in the real world may be extremely costly and time consuming. Simulated training offers an appealing alternative, but ensuring that policies trained in simulation can transfer effectively into the real world requires additional machinery. Simulations may not match reality, and typically bridging the simulation-to-reality gap requires domain knowledge and task-specific engineering. We can automate this process by employing generative models to translate simulated images into realistic ones. However, this sort of translation is typically task-agnostic, in that the translated images may not preserve all features that are relevant to the task. In this paper, we introduce the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image. This allows us to learn a task-aware translation. Incorporating this loss into unsupervised domain translation, we obtain RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning. In evaluations of RL-CycleGAN on two vision-based robotics grasping tasks, we show that RL-CycleGAN offers a substantial improvement over a number of prior methods for sim-to-real transfer, attaining excellent real-world performance with only a modest number of real-world observations.