通过视频预测来解开可控的对象，改善了视觉增强学习

论文标题

通过视频预测来解开可控的对象，改善了视觉增强学习

Disentangling Controllable Object through Video Prediction Improves Visual Reinforcement Learning

论文作者

Zhong, Yuanyi, Schwing, Alexander, Peng, Jian

论文摘要

在许多基于视觉的增强学习（RL）问题中，代理在其视野中控制一个可移动的对象，例如，玩家在视频游戏中的头像和视觉抓握和操纵中的机器人手臂。利用动作条件的视频预测，我们提出了一个端到端的学习框架，以使可控对象与观察信号相关。截面表示形式显示对RL有用，作为代理的其他观察通道。具有流行的双DQN算法的一组Atari游戏的实验表明样品效率和游戏性能提高（在标准化的游戏分数中从222.8％到261.4％，并具有预测奖励奖励）。

In many vision-based reinforcement learning (RL) problems, the agent controls a movable object in its visual field, e.g., the player's avatar in video games and the robotic arm in visual grasping and manipulation. Leveraging action-conditioned video prediction, we propose an end-to-end learning framework to disentangle the controllable object from the observation signal. The disentangled representation is shown to be useful for RL as additional observation channels to the agent. Experiments on a set of Atari games with the popular Double DQN algorithm demonstrate improved sample efficiency and game performance (from 222.8% to 261.4% measured in normalized game scores, with prediction bonus reward).

下载PDF全文

下载文献需遵守相关版权规定

论文标题