Maeli：大规模激光点云的蒙面自动编码器

论文标题

Maeli：大规模激光点云的蒙面自动编码器

MAELi: Masked Autoencoder for Large-Scale LiDAR Point Clouds

论文作者

Krispel, Georg, Schinagl, David, Fruhwirth-Reisinger, Christian, Possegger, Horst, Bischof, Horst

论文摘要

大规模激光点云的传感过程不可避免地会引起大盲点，即传感器看不到的区域。我们通过设计一个高效的预训练前框架，大大减少了对训练最先进的对象探测器的繁琐的3D注释的需求，从而证明了如何有效地利用这些固有的抽样属性来进行自我监督的表示学习。我们针对Lidar Point Clouds（Maeli）的蒙面自动编码器直觉地利用了重建过程中编码器和解码器中LIDAR点云的稀疏性。这会导致更有用的初始化，可以直接应用于下游感知任务，例如3D对象检测或用于自主驾驶的语义分割。在一种新颖的重建方法中，Maeli区分了空的空间和遮挡空间，并采用了一种针对激光雷达固有的球形投影的新掩蔽策略。因此，没有任何地面真相，只在单一框架上接受训练，Maeli获得了对基础3D场景几何和语义的理解。为了证明Maeli的潜力，我们以端到端的方式预训练了主链，并显示了我们无监督的预训练权重对3D对象检测和语义分割的任务的有效性。

The sensing process of large-scale LiDAR point clouds inevitably causes large blind spots, i.e. regions not visible to the sensor. We demonstrate how these inherent sampling properties can be effectively utilized for self-supervised representation learning by designing a highly effective pre-training framework that considerably reduces the need for tedious 3D annotations to train state-of-the-art object detectors. Our Masked AutoEncoder for LiDAR point clouds (MAELi) intuitively leverages the sparsity of LiDAR point clouds in both the encoder and decoder during reconstruction. This results in more expressive and useful initialization, which can be directly applied to downstream perception tasks, such as 3D object detection or semantic segmentation for autonomous driving. In a novel reconstruction approach, MAELi distinguishes between empty and occluded space and employs a new masking strategy that targets the LiDAR's inherent spherical projection. Thereby, without any ground truth whatsoever and trained on single frames only, MAELi obtains an understanding of the underlying 3D scene geometry and semantics. To demonstrate the potential of MAELi, we pre-train backbones in an end-to-end manner and show the effectiveness of our unsupervised pre-trained weights on the tasks of 3D object detection and semantic segmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题