论文标题
G2L-NET:全球到本地网络实时6D姿势估计,并具有嵌入向量功能
G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features
论文作者
论文摘要
在本文中,我们提出了一个新型的实时6D对象构成估计框架,名为G2L-NET。我们的网络以分裂和拼接方式从RGB-D检测到点云上运行。具体来说,我们的网络包括三个步骤。首先,我们通过2D检测从RGB-D图像中提取粗对象点云。其次,我们将粗对象点云馈送到翻译本地化网络以执行3D分割和对象翻译预测。第三,通过预测的分割和翻译,我们将细对象点云转移到局部规范坐标中,在该坐标中,我们在其中训练旋转定位网络以估计初始对象旋转。在第三步中,我们定义了点嵌入矢量特征以捕获视点感知信息。为了计算更准确的旋转,我们采用旋转残差估计器来估计初始旋转和地面真相之间的残差,从而可以提高初始姿势估计性能。尽管事实通过提议的粗到精细框架堆叠了多个步骤,但我们提出的G2L-NET还是实时的。两个基准数据集的广泛实验表明,G2L-NET在准确性和速度方面都可以达到最新的性能。
In this paper, we propose a novel real-time 6D object pose estimation framework, named G2L-Net. Our network operates on point clouds from RGB-D detection in a divide-and-conquer fashion. Specifically, our network consists of three steps. First, we extract the coarse object point cloud from the RGB-D image by 2D detection. Second, we feed the coarse object point cloud to a translation localization network to perform 3D segmentation and object translation prediction. Third, via the predicted segmentation and translation, we transfer the fine object point cloud into a local canonical coordinate, in which we train a rotation localization network to estimate initial object rotation. In the third step, we define point-wise embedding vector features to capture viewpoint-aware information. To calculate more accurate rotation, we adopt a rotation residual estimator to estimate the residual between initial rotation and ground truth, which can boost initial pose estimation performance. Our proposed G2L-Net is real-time despite the fact multiple steps are stacked via the proposed coarse-to-fine framework. Extensive experiments on two benchmark datasets show that G2L-Net achieves state-of-the-art performance in terms of both accuracy and speed.