论文标题
SRCN3D:紧凑卷积多视图3D对象检测和跟踪的稀疏R-CNN 3D
SRCN3D: Sparse R-CNN 3D for Compact Convolutional Multi-View 3D Object Detection and Tracking
论文作者
论文摘要
检测和跟踪移动对象是自动驾驶环境感知的重要组成部分。在多视图3D摄像机检测器的蓬勃发展领域中,从透视图的2D特征图中学习了不同的基于变压器的管道在3D空间中学习查询,但是主要的密度BEV查询机制在计算上效率低下。本文提出了稀疏的R-CNN 3D(SRCN3D),这是一种新型的两阶段全距离检测器,结合了稀疏查询,稀疏的查询,稀疏的关注和盒子的采样以及稀疏的预测。 SRCN3D采用级联结构,并通过固定数量的查询盒和潜在查询功能进行双轨更新。我们的新颖稀疏功能采样模块仅利用局部2D感兴趣的区域(ROI)功能,该功能通过3D查询盒的投影计算出来,以进行进一步的盒子,从而导致完全横向的和部署友好的管道。为了进行多目标跟踪,运动功能,查询功能和ROI功能在多刺激数据关联中全面使用。在Nuscenes数据集上进行的广泛实验表明,SRCN3D在3D对象检测和多目标跟踪任务中都能达到竞争性能,同时与基于变压器的方法相比,也表现出较高的效率。代码和型号可在https://github.com/synsin0/srcn3d上找到。
Detection and tracking of moving objects is an essential component in environmental perception for autonomous driving. In the flourishing field of multi-view 3D camera-based detectors, different transformer-based pipelines are designed to learn queries in 3D space from 2D feature maps of perspective views, but the dominant dense BEV query mechanism is computationally inefficient. This paper proposes Sparse R-CNN 3D (SRCN3D), a novel two-stage fully-sparse detector that incorporates sparse queries, sparse attention with box-wise sampling, and sparse prediction. SRCN3D adopts a cascade structure with the twin-track update of both a fixed number of query boxes and latent query features. Our novel sparse feature sampling module only utilizes local 2D region of interest (RoI) features calculated by the projection of 3D query boxes for further box refinement, leading to a fully-convolutional and deployment-friendly pipeline. For multi-object tracking, motion features, query features and RoI features are comprehensively utilized in multi-hypotheses data association. Extensive experiments on nuScenes dataset demonstrate that SRCN3D achieves competitive performance in both 3D object detection and multi-object tracking tasks, while also exhibiting superior efficiency compared to transformer-based methods. Code and models are available at https://github.com/synsin0/SRCN3D.