论文标题
跨模式3D对象检测
Cross-Modality 3D Object Detection
论文作者
论文摘要
在本文中,鉴于两种方式的互补性质,即图像具有更多的语义信息,而点云则专门从事距离感应,我们将重点探索3D对象检测图像和点云的融合。为此,我们提出了一个新型的两阶段多模式融合网络,用于3D对象检测,以双眼图像和原始点云作为输入。整个体系结构有助于两阶段的融合。第一阶段的目的是通过稀疏点的特征融合产生3D提案。在第一阶段,我们进一步利用了一种关节锚固机制,该机制使网络能够同时利用2D-3D分类和回归,以更好地生成提案。 第二阶段在2D和3D提案区域工作,并融合其密集的特征。此外,我们建议使用立体声匹配的伪激光点作为数据增强方法来吸收激光雷达点,因为我们观察到检测网络丢失的对象大多数的点数大多数的点太少,尤其是对于遥远的对象。我们在KITTI数据集上的实验表明,所提出的多阶段融合有助于网络学习更好的表示。
In this paper, we focus on exploring the fusion of images and point clouds for 3D object detection in view of the complementary nature of the two modalities, i.e., images possess more semantic information while point clouds specialize in distance sensing. To this end, we present a novel two-stage multi-modal fusion network for 3D object detection, taking both binocular images and raw point clouds as input. The whole architecture facilitates two-stage fusion. The first stage aims at producing 3D proposals through sparse point-wise feature fusion. Within the first stage, we further exploit a joint anchor mechanism that enables the network to utilize 2D-3D classification and regression simultaneously for better proposal generation. The second stage works on the 2D and 3D proposal regions and fuses their dense features. In addition, we propose to use pseudo LiDAR points from stereo matching as a data augmentation method to densify the LiDAR points, as we observe that objects missed by the detection network mostly have too few points especially for far-away objects. Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.