3D对象检测具有自我监督的LiDAR场景流动主链

论文标题

3D对象检测具有自我监督的LiDAR场景流动主链

3D Object Detection with a Self-supervised Lidar Scene Flow Backbone

论文作者

Yurtsever, Ekim, Erçelik, Emeç, Liu, Mingyu, Yang, Zhijie, Zhang, Hanzhen, Topçam, Pınar, Listl, Maximilian, Çaylı, Yılmaz Kaan, Knoll, Alois

论文摘要

基于最新的激光痛的3D对象检测方法依赖于监督学习和大型标记数据集。但是，注释LiDAR数据是资源消耗的，仅取决于监督的学习限制了训练有素的模型的适用性。自我监督的培训策略可以通过学习下游3D视觉任务的通用点云主链模型来减轻这些问题。在此背景下，我们显示了自我监管的多框流程表示与单帧3D检测假设之间的关系。我们的主要贡献利用了流动和运动表示，并将自我保护的主链与有监督的3D检测头结合在一起。首先，一个自我监督的场景流估计模型通过循环一致性进行了训练。然后，该模型的点云编码器用作单帧3D对象检测头模型的骨干。第二个3D对象检测模型学会利用运动表示来区分表现出不同运动模式的动态对象。 Kitti和Nuscenes基准的实验表明，提出的自我监管的预训练可显着提高3D检测性能。 https://github.com/emecercelik/ssl-3d-detection.git

State-of-the-art lidar-based 3D object detection methods rely on supervised learning and large labeled datasets. However, annotating lidar data is resource-consuming, and depending only on supervised learning limits the applicability of trained models. Self-supervised training strategies can alleviate these issues by learning a general point cloud backbone model for downstream 3D vision tasks. Against this backdrop, we show the relationship between self-supervised multi-frame flow representations and single-frame 3D detection hypotheses. Our main contribution leverages learned flow and motion representations and combines a self-supervised backbone with a supervised 3D detection head. First, a self-supervised scene flow estimation model is trained with cycle consistency. Then, the point cloud encoder of this model is used as the backbone of a single-frame 3D object detection head model. This second 3D object detection model learns to utilize motion representations to distinguish dynamic objects exhibiting different movement patterns. Experiments on KITTI and nuScenes benchmarks show that the proposed self-supervised pre-training increases 3D detection performance significantly. https://github.com/emecercelik/ssl-3d-detection.git

下载PDF全文

下载文献需遵守相关版权规定

论文标题