论文标题
Bev-seg:使用几何和语义点云的鸟类眼景语义分割
BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud
论文作者
论文摘要
Bird's-eye-View(BEV)是一种强大的且广泛采用的路线,捕捉周围物体及其空间位置以及场景中的整体背景。在这项工作中,我们专注于鸟类的眼睛语义分割,这项任务可以从侧面RGB图像中预测BEV中像素的语义分割。诸如Carla之类的模拟器使这项任务成为可能,Carla允许在现实世界中以否则不可能的方式收集廉价的数据收集,任意相机安置和监督。这项任务面临两个主要挑战:从侧视图到鸟类视图的视图转换,以及转移学习到看不见的域。现有的工作通过完全连接的图层之间的视图之间转换,并通过gans进行转移学习。这遭受了跨领域缺乏深度推理和绩效降解的影响。我们新颖的2阶段感知管道明确地预测了像素深度,并以有效的方式将它们与像素语义相结合,从而使模型能够利用深度信息来推断BEV中的对象的空间位置。此外,我们通过抽象高级几何特征并预测在不同域中常见的中间表示来传递学习。我们发布了一个名为BevSeg-Carla的新数据集,并表明我们的方法将最新的MIOU提高了24%,并且在转移到新域时的性能很好。
Bird's-eye-view (BEV) is a powerful and widely adopted representation for road scenes that captures surrounding objects and their spatial locations, along with overall context in the scene. In this work, we focus on bird's eye semantic segmentation, a task that predicts pixel-wise semantic segmentation in BEV from side RGB images. This task is made possible by simulators such as Carla, which allow for cheap data collection, arbitrary camera placements, and supervision in ways otherwise not possible in the real world. There are two main challenges to this task: the view transformation from side view to bird's eye view, as well as transfer learning to unseen domains. Existing work transforms between views through fully connected layers and transfer learns via GANs. This suffers from a lack of depth reasoning and performance degradation across domains. Our novel 2-staged perception pipeline explicitly predicts pixel depths and combines them with pixel semantics in an efficient manner, allowing the model to leverage depth information to infer objects' spatial locations in the BEV. In addition, we transfer learning by abstracting high-level geometric features and predicting an intermediate representation that is common across different domains. We publish a new dataset called BEVSEG-Carla and show that our approach improves state-of-the-art by 24% mIoU and performs well when transferred to a new domain.