快速姿势标签通过稀疏表示未知物体的代表

论文标题

快速姿势标签通过稀疏表示未知物体的代表

Rapid Pose Label Generation through Sparse Representation of Unknown Objects

论文作者

Singh, Rohan Pratap, Benallegue, Mehdi, Yoshiyasu, Yusuke, Kanehiro, Fumio

论文摘要

深层卷积神经网络（CNN）已成功部署在机器人上，以通过视觉感知估计6-DOF对象。但是，在监督CNN所需的规模上获取标记的数据是一项艰巨的任务 - 如果对象是新颖的，并且不可用的3D模型会加剧。为此，这项工作介绍了一种快速生成现实世界中未知对象的姿势的RGB-D数据的方法。我们的方法不仅规避了对先前的3D对象模型（纹理或其他方式）的需求，而且还绕过了基准标记，转盘和传感器的复杂设置。在人类用户的帮助下，我们首先在一组RGB-D视频上首先采购一组任意选择的关键点的简约标签。然后，通过解决优化问题，我们将这些标签结合在世界框架下，以恢复对象的稀疏，基于关键点的表示。稀疏表示会导致一个密集模型的开发和场景集中每个图像框架的姿势标签。我们表明，稀疏模型也可以有效地用于扩展到大量新场景。我们通过训练管道来训练6-DOF对象姿势估计和像素细分网络，证明了生成的标记数据集的实用性。

Deep Convolutional Neural Networks (CNNs) have been successfully deployed on robots for 6-DoF object pose estimation through visual perception. However, obtaining labeled data on a scale required for the supervised training of CNNs is a difficult task - exacerbated if the object is novel and a 3D model is unavailable. To this end, this work presents an approach for rapidly generating real-world, pose-annotated RGB-D data for unknown objects. Our method not only circumvents the need for a prior 3D object model (textured or otherwise) but also bypasses complicated setups of fiducial markers, turntables, and sensors. With the help of a human user, we first source minimalistic labelings of an ordered set of arbitrarily chosen keypoints over a set of RGB-D videos. Then, by solving an optimization problem, we combine these labels under a world frame to recover a sparse, keypoint-based representation of the object. The sparse representation leads to the development of a dense model and the pose labels for each image frame in the set of scenes. We show that the sparse model can also be efficiently used for scaling to a large number of new scenes. We demonstrate the practicality of the generated labeled dataset by training a pipeline for 6-DoF object pose estimation and a pixel-wise segmentation network.

下载PDF全文

下载文献需遵守相关版权规定

论文标题