论文标题
烟:单阶段单眼3D对象检测通过关键点估计
SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation
论文作者
论文摘要
估计对象的3D方向和翻译对于无基础架构的自主导航和驾驶至关重要。在单眼视觉的情况下,成功的方法主要基于两种成分:(i)生成2D区域建议的网络,(ii)通过利用获得的感兴趣区域来预测3D对象姿势的R-CNN结构。我们认为2D检测网络是冗余的,并引入了3D检测的不可忽略的噪声。因此,我们在本文中提出了一种名为Smoke的新型3D对象检测方法,该方法通过将单个关键点估计与回归3D变量相结合,预测每个检测到的对象的3D边界框。作为第二个贡献,我们提出了一种用于构建3D边界框的多步解开方法,从而显着提高了训练收敛性和检测准确性。与以前的3D检测技术相反,我们的方法不需要复杂的预/后处理,额外的数据和改进阶段。尽管具有结构性简单性,但我们提出的烟雾网络的表现优于Kitti数据集上的所有现有单程3D检测方法,从而在3D对象检测和鸟类视图评估中给出了最佳最新结果。该代码将公开可用。
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving. In case of monocular vision, successful methods have been mainly based on two ingredients: (i) a network generating 2D region proposals, (ii) a R-CNN structure predicting 3D object pose by utilizing the acquired regions of interest. We argue that the 2D detection network is redundant and introduces non-negligible noise for 3D detection. Hence, we propose a novel 3D object detection method, named SMOKE, in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables. As a second contribution, we propose a multi-step disentangling approach for constructing the 3D bounding box, which significantly improves both training convergence and detection accuracy. In contrast to previous 3D detection techniques, our method does not require complicated pre/post-processing, extra data, and a refinement stage. Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset, giving the best state-of-the-art result on both 3D object detection and Bird's eye view evaluation. The code will be made publicly available.