目标条件的端到端视觉运动控制的多功能技能原始功能

论文标题

目标条件的端到端视觉运动控制的多功能技能原始功能

Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill Primitives

论文作者

Groth, Oliver, Hung, Chia-Man, Vedaldi, Andrea, Posner, Ingmar

论文摘要

Visuomotor控制（VMC）是实现基本操纵任务的有效手段，例如从原始图像中推送或拾取位置。在所需目标状态下对VMC进行调节是实现多功能技能基础的一种有前途的方法。但是，常见的调理方案要么依赖于特定于任务的微调，例如使用一击模仿学习（IL） - 或使用场景动态的前向模型，即模型预测性控制（MPC）进行采样方法，从而使可部署性和计划受到严格限制。在本文中，我们提出了一种调节方案，该方案通过以端到端的方式学习控制器及其条件来避免这些陷阱。我们的模型可以直接基于机器人运动的动态图像表示以及与给定目标观察的距离直接预测复杂的动作序列。与相关作品相反，这使我们可以从原始图像观测值中有效执行复杂的操纵任务的方法，而无需预定义的控制原始图或测试时间演示。我们报告了与代表性MPC和IL基准相比，任务成功的显着改善。我们还展示了模型在具有挑战性的，看不见的任务中具有视觉噪音，混乱的场景和看不见的对象几何形状的概括能力。

Visuomotor control (VMC) is an effective means of achieving basic manipulation tasks such as pushing or pick-and-place from raw images. Conditioning VMC on desired goal states is a promising way of achieving versatile skill primitives. However, common conditioning schemes either rely on task-specific fine tuning - e.g. using one-shot imitation learning (IL) - or on sampling approaches using a forward model of scene dynamics i.e. model-predictive control (MPC), leaving deployability and planning horizon severely limited. In this paper we propose a conditioning scheme which avoids these pitfalls by learning the controller and its conditioning in an end-to-end manner. Our model predicts complex action sequences based directly on a dynamic image representation of the robot motion and the distance to a given target observation. In contrast to related works, this enables our approach to efficiently perform complex manipulation tasks from raw image observations without predefined control primitives or test time demonstrations. We report significant improvements in task success over representative MPC and IL baselines. We also demonstrate our model's generalisation capabilities in challenging, unseen tasks featuring visual noise, cluttered scenes and unseen object geometries.

下载PDF全文

下载文献需遵守相关版权规定

论文标题