G2L-NET：全球到本地网络实时6D姿势估计，并具有嵌入向量功能

论文标题

G2L-NET：全球到本地网络实时6D姿势估计，并具有嵌入向量功能

G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features

论文作者

Chen, Wei, Jia, Xi, Chang, Hyung Jin, Duan, Jinming, Leonardis, Ales

论文摘要

在本文中，我们提出了一个新型的实时6D对象构成估计框架，名为G2L-NET。我们的网络以分裂和拼接方式从RGB-D检测到点云上运行。具体来说，我们的网络包括三个步骤。首先，我们通过2D检测从RGB-D图像中提取粗对象点云。其次，我们将粗对象点云馈送到翻译本地化网络以执行3D分割和对象翻译预测。第三，通过预测的分割和翻译，我们将细对象点云转移到局部规范坐标中，在该坐标中，我们在其中训练旋转定位网络以估计初始对象旋转。在第三步中，我们定义了点嵌入矢量特征以捕获视点感知信息。为了计算更准确的旋转，我们采用旋转残差估计器来估计初始旋转和地面真相之间的残差，从而可以提高初始姿势估计性能。尽管事实通过提议的粗到精细框架堆叠了多个步骤，但我们提出的G2L-NET还是实时的。两个基准数据集的广泛实验表明，G2L-NET在准确性和速度方面都可以达到最新的性能。

In this paper, we propose a novel real-time 6D object pose estimation framework, named G2L-Net. Our network operates on point clouds from RGB-D detection in a divide-and-conquer fashion. Specifically, our network consists of three steps. First, we extract the coarse object point cloud from the RGB-D image by 2D detection. Second, we feed the coarse object point cloud to a translation localization network to perform 3D segmentation and object translation prediction. Third, via the predicted segmentation and translation, we transfer the fine object point cloud into a local canonical coordinate, in which we train a rotation localization network to estimate initial object rotation. In the third step, we define point-wise embedding vector features to capture viewpoint-aware information. To calculate more accurate rotation, we adopt a rotation residual estimator to estimate the residual between initial rotation and ground truth, which can boost initial pose estimation performance. Our proposed G2L-Net is real-time despite the fact multiple steps are stacked via the proposed coarse-to-fine framework. Extensive experiments on two benchmark datasets show that G2L-Net achieves state-of-the-art performance in terms of both accuracy and speed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题