6点：在6D姿势估算机器人抓握的范围内桥接现实差距

论文标题

6点：在6D姿势估算机器人抓握的范围内桥接现实差距

6IMPOSE: Bridging the Reality Gap in 6D Pose Estimation for Robotic Grasping

论文作者

Cao, Hongpeng, Dirnberger, Lukas, Bernardini, Daniele, Piazza, Cristina, Caccamo, Marco

论文摘要

6D姿势识别一直是机器人抓握成功的关键因素，而最近基于深度学习的方法在基准上取得了显着的结果。但是，它们在现实世界应用中的概括功能尚不清楚。为了克服这一差距，我们引入了6点，这是一个用于SIM到现实数据生成和6D姿势估计的新型框架。 6点由四个模块组成：首先，采用3D软件套件搅拌机来创建具有6D姿势注释的合成RGBD图像数据集。其次，使用拟议管道生成的五个家庭对象的带注释的RGBD数据集。第三，一种实时的两阶段6D姿势估计方法，该方法集成了对象检测器YOLO-V4和6D姿势估计算法PVN3D的简化，实时版本，用于时间敏感的机器人技术。第四，旨在促进视觉系统将其集成到机器人抓手实验中的代码库。我们的方法证明了大量照片现实的RGBD图像的有效产生，并成功地将受过训练的推理模型成功地转移到机器人握把实验中，在不同的光线条件下，从杂乱无章的背景中掌握了五个不同的家庭对象，达到了87％的总体成功率。通过微调数据生成和域随机化技术以及推理管道的优化，克服了原始PVN3D算法的概括和性能缺点，这是可能的。最后，我们制作代码，合成数据集以及GitHub上可用的所有预处理模型。

6D pose recognition has been a crucial factor in the success of robotic grasping, and recent deep learning based approaches have achieved remarkable results on benchmarks. However, their generalization capabilities in real-world applications remain unclear. To overcome this gap, we introduce 6IMPOSE, a novel framework for sim-to-real data generation and 6D pose estimation. 6IMPOSE consists of four modules: First, a data generation pipeline that employs the 3D software suite Blender to create synthetic RGBD image datasets with 6D pose annotations. Second, an annotated RGBD dataset of five household objects generated using the proposed pipeline. Third, a real-time two-stage 6D pose estimation approach that integrates the object detector YOLO-V4 and a streamlined, real-time version of the 6D pose estimation algorithm PVN3D optimized for time-sensitive robotics applications. Fourth, a codebase designed to facilitate the integration of the vision system into a robotic grasping experiment. Our approach demonstrates the efficient generation of large amounts of photo-realistic RGBD images and the successful transfer of the trained inference model to robotic grasping experiments, achieving an overall success rate of 87% in grasping five different household objects from cluttered backgrounds under varying lighting conditions. This is made possible by the fine-tuning of data generation and domain randomization techniques, and the optimization of the inference pipeline, overcoming the generalization and performance shortcomings of the original PVN3D algorithm. Finally, we make the code, synthetic dataset, and all the pretrained models available on Github.

下载PDF全文

下载文献需遵守相关版权规定

论文标题