通过真实视频的大概状态估算来学习对象操纵技能

论文标题

通过真实视频的大概状态估算来学习对象操纵技能

Learning Object Manipulation Skills via Approximate State Estimation from Real Videos

论文作者

Petrík, Vladimír, Tapaswi, Makarand, Laptev, Ivan, Sivic, Josef

论文摘要

人类通过观看一些教学视频来学习新任务。另一方面，学习新操作的机器人要么通过反复试验需要大量的努力，要么使用具有挑战性的专家演示。在本文中，我们探讨了一种直接从视频中直接学习对象操纵技能的方法。利用2D视觉识别和可区分渲染的最新进展，我们开发了一种基于优化的方法来估计手和操纵对象的粗3D状态表示，而无需任何监督。我们将这些轨迹用作浓厚的奖励，用于通过增强学习来模仿它们的代理商。我们从某种事物数据集中对简单的单对象操作评估了我们的方法。我们的方法使代理商可以从单个视频中学习动作，同时观看多次演示使该政策更加强大。我们表明，在模拟环境中学到的政策很容易被转移到真正的机器人中。

Humans are adept at learning new tasks by watching a few instructional videos. On the other hand, robots that learn new actions either require a lot of effort through trial and error, or use expert demonstrations that are challenging to obtain. In this paper, we explore a method that facilitates learning object manipulation skills directly from videos. Leveraging recent advances in 2D visual recognition and differentiable rendering, we develop an optimization based method to estimate a coarse 3D state representation for the hand and the manipulated object(s) without requiring any supervision. We use these trajectories as dense rewards for an agent that learns to mimic them through reinforcement learning. We evaluate our method on simple single- and two-object actions from the Something-Something dataset. Our approach allows an agent to learn actions from single videos, while watching multiple demonstrations makes the policy more robust. We show that policies learned in a simulated environment can be easily transferred to a real robot.

下载PDF全文

下载文献需遵守相关版权规定

论文标题