基于立体声视觉的单发6D对象姿势姿势估计机器人操纵器挑选bin

论文标题

基于立体声视觉的单发6D对象姿势姿势估计机器人操纵器挑选bin

Stereo Vision Based Single-Shot 6D Object Pose Estimation for Bin-Picking by a Robot Manipulator

论文作者

Nakano, Yoshihiro

论文摘要

我们提出了一种快速准确的方法，即通过机器人操纵器对机械零件进行bin键入6D对象的姿势估计。我们通过应用注意体系结构将单发方法扩展到立体声视觉。我们的卷积神经网络模型回归到没有深度信息的左图像或右图像的对象位置和旋转。然后，一个立体声匹配模块（被指定为立体声电网注意）生成立体声网格匹配地图。我们方法的重要点仅是计算来自立体声图像的注意力发现的对象的差异，而不是在整个图像上计算点云。然后，使用三角剖分原理将视差值用于计算对象的深度。我们的方法还通过单次架构实现了姿势估计的快速处理速度，并且可以在Jetson Agx Xavier上以半浮动模型实现的75毫秒的75毫秒中处理1024 x 1024像素图像。质感弱的机械零件用于例证该方法。首先，我们创建原始的合成数据集，用于培训和评估所提出的模型。该数据集是通过捕获和渲染虚拟空间中几种机械零件的许多3D模型来创建的。最后，我们使用带有电磁夹具的机器人操纵器在杂乱无章的状态下拾取机械零件，以验证实际场景中方法的有效性。当我们的立体声摄像头提出的方法使用原始的立体声图像来检测黑色钢螺钉，不锈钢螺钉和直流电动机零件，即壳体，转子芯和换向器上限，bin键入任务的成功率为76.3％，64.0％，50.5％，89.1％，89.1％和64.2％的概率。

We propose a fast and accurate method of 6D object pose estimation for bin-picking of mechanical parts by a robot manipulator. We extend the single-shot approach to stereo vision by application of attention architecture. Our convolutional neural network model regresses to object locations and rotations from either a left image or a right image without depth information. Then, a stereo feature matching module, designated as Stereo Grid Attention, generates stereo grid matching maps. The important point of our method is only to calculate disparity of the objects found by the attention from stereo images, instead of calculating a point cloud over the entire image. The disparity value is then used to calculate the depth to the objects by the principle of triangulation. Our method also achieves a rapid processing speed of pose estimation by the single-shot architecture and it is possible to process a 1024 x 1024 pixels image in 75 milliseconds on the Jetson AGX Xavier implemented with half-float model. Weakly textured mechanical parts are used to exemplify the method. First, we create original synthetic datasets for training and evaluating of the proposed model. This dataset is created by capturing and rendering numerous 3D models of several types of mechanical parts in virtual space. Finally, we use a robotic manipulator with an electromagnetic gripper to pick up the mechanical parts in a cluttered state to verify the validity of our method in an actual scene. When a raw stereo image is used by the proposed method from our stereo camera to detect black steel screws, stainless screws, and DC motor parts, i.e., cases, rotor cores and commutator caps, the bin-picking tasks are successful with 76.3%, 64.0%, 50.5%, 89.1% and 64.2% probability, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题