BEV-LGKD：BEV 3D对象检测的统一激光雷达引导的知识蒸馏框架

论文标题

BEV-LGKD：BEV 3D对象检测的统一激光雷达引导的知识蒸馏框架

BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for BEV 3D Object Detection

论文作者

Li, Jianing, Lu, Ming, Liu, Jiaming, Guo, Yandong, Du, Li, Zhang, Shanghang

论文摘要

最近，在多视图3D对象检测中，Bird's-eye-eview（BEV）表示越来越受到关注，这在自主驾驶中表现出了有希望的应用。尽管可以低成本部署多视图摄像机系统，但缺乏深度信息使当前的方法采用大型模型以良好的性能。因此，必须提高BEV 3D对象检测的效率。知识蒸馏（KD）是培训有效但准确模型的最实用技术之一。但是，BEV KD仍然对我们的最佳知识进行了探讨。与图像分类任务不同，BEV 3D对象检测方法更复杂，由几个组件组成。在本文中，我们提出了一个名为bev-lgkd的统一框架，以教师的方式转移知识。但是，由于RGB摄像机中的大量背景信息，直接将教师范式应用于BEV功能无法获得令人满意的结果。为了解决这个问题，我们建议利用LiDar点的本地化优势。具体而言，我们将激光雷达的点转换为BEV空间，并为教师学生范式生成前景面具和观看依赖的面具。要注意的是，我们的方法仅使用LIDAR点来指导RGB模型之间的KD。由于深度估计的质量对于BEV感知至关重要，因此我们进一步将深度蒸馏引入了我们的框架。我们的统一框架简单而有效，并取得了重大的性能。代码将发布。

Recently, Bird's-Eye-View (BEV) representation has gained increasing attention in multi-view 3D object detection, which has demonstrated promising applications in autonomous driving. Although multi-view camera systems can be deployed at low cost, the lack of depth information makes current approaches adopt large models for good performance. Therefore, it is essential to improve the efficiency of BEV 3D object detection. Knowledge Distillation (KD) is one of the most practical techniques to train efficient yet accurate models. However, BEV KD is still under-explored to the best of our knowledge. Different from image classification tasks, BEV 3D object detection approaches are more complicated and consist of several components. In this paper, we propose a unified framework named BEV-LGKD to transfer the knowledge in the teacher-student manner. However, directly applying the teacher-student paradigm to BEV features fails to achieve satisfying results due to heavy background information in RGB cameras. To solve this problem, we propose to leverage the localization advantage of LiDAR points. Specifically, we transform the LiDAR points to BEV space and generate the foreground mask and view-dependent mask for the teacher-student paradigm. It is to be noted that our method only uses LiDAR points to guide the KD between RGB models. As the quality of depth estimation is crucial for BEV perception, we further introduce depth distillation to our framework. Our unified framework is simple yet effective and achieves a significant performance boost. Code will be released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题