通过视频进行强化学习：将离线观测与互动相结合

论文标题

通过视频进行强化学习：将离线观测与互动相结合

Reinforcement Learning with Videos: Combining Offline Observations with Interaction

论文作者

Schmeckpeper, Karl, Rybkin, Oleh, Daniilidis, Kostas, Levine, Sergey, Finn, Chelsea

论文摘要

强化学习是机器人从经验中获取技能的有力框架，但通常需要大量的在线数据收集。结果，很难收集足够多样化的体验，以使机器人广泛概括。另一方面，人类的视频是广泛而有趣的体验的一种可用来源。在本文中，我们考虑了一个问题：我们可以直接根据人类收集的经验进行强化学习吗？这个问题尤其困难，因为这些视频没有用动作注释，并且相对于机器人的实施例表现出很大的视觉域变化。为了应对这些挑战，我们建议使用视频（RLV）进行加强学习的框架。 RLV使用人类收集的经验与机器人收集的数据一起学习政策和价值功能。在我们的实验中，我们发现RLV能够利用此类视频来学习具有挑战性的基于远见的技能，而不到一半的样本与从头开始学习的RL方法一样多。

Reinforcement learning is a powerful framework for robots to acquire skills from experience, but often requires a substantial amount of online data collection. As a result, it is difficult to collect sufficiently diverse experiences that are needed for robots to generalize broadly. Videos of humans, on the other hand, are a readily available source of broad and interesting experiences. In this paper, we consider the question: can we perform reinforcement learning directly on experience collected by humans? This problem is particularly difficult, as such videos are not annotated with actions and exhibit substantial visual domain shift relative to the robot's embodiment. To address these challenges, we propose a framework for reinforcement learning with videos (RLV). RLV learns a policy and value function using experience collected by humans in combination with data collected by robots. In our experiments, we find that RLV is able to leverage such videos to learn challenging vision-based skills with less than half as many samples as RL methods that learn from scratch.

下载PDF全文

下载文献需遵守相关版权规定

论文标题