通过模拟多帧点云来增强单帧3D对象检测

论文标题

通过模拟多帧点云来增强单帧3D对象检测

Boosting Single-Frame 3D Object Detection by Simulating Multi-Frame Point Clouds

论文作者

Zheng, Wu, Jiang, Li, Lu, Fanbin, Ye, Yangyang, Fu, Chi-Wing

论文摘要

为了提高单帧3D对象检测的检测器，我们提出了一种新方法来训练它，以模拟在多帧点云上训练的检测器之后的功能和响应。我们的方法仅在训练单帧检测器时才需要多帧点云，并且一旦受过训练，它就可以在推理过程中以单帧点云作为输入来检测对象。我们设计了一个新颖的模拟多帧单阶段对象检测器（SMF-SSD）框架来实现该方法：多视图密度对象融合以使地面真实对象的密集对象融合以生成多帧点云；自我发作的体素蒸馏，以促进从多框到单帧体素的一到一对知识转移；多尺度的BEV功能蒸馏以在低级空间和高级语义BEV特征中转移知识；和自适应响应蒸馏以激活高置信度和准确定位的单帧反应。 Waymo测试集上的实验结果表明，我们的SMF-SSD始终优于所有最新的单帧3D对象检测器，用于所有难度级别1和2的对象类别，就MAP和MAPH而言。

To boost a detector for single-frame 3D object detection, we present a new approach to train it to simulate features and responses following a detector trained on multi-frame point clouds. Our approach needs multi-frame point clouds only when training the single-frame detector, and once trained, it can detect objects with only single-frame point clouds as inputs during the inference. We design a novel Simulated Multi-Frame Single-Stage object Detector (SMF-SSD) framework to realize the approach: multi-view dense object fusion to densify ground-truth objects to generate a multi-frame point cloud; self-attention voxel distillation to facilitate one-to-many knowledge transfer from multi- to single-frame voxels; multi-scale BEV feature distillation to transfer knowledge in low-level spatial and high-level semantic BEV features; and adaptive response distillation to activate single-frame responses of high confidence and accurate localization. Experimental results on the Waymo test set show that our SMF-SSD consistently outperforms all state-of-the-art single-frame 3D object detectors for all object classes of difficulty levels 1 and 2 in terms of both mAP and mAPH.

下载PDF全文

下载文献需遵守相关版权规定

论文标题