论文标题
基于视觉注意的自我监督的绝对深度估计,使用几何先验在自主驾驶中
Visual Attention-based Self-supervised Absolute Depth Estimation using Geometric Priors in Autonomous Driving
论文作者
论文摘要
尽管现有的单眼深度估计方法取得了长足的进步,但由于网络的建模能力有限和规模歧义问题,预测单个图像的准确绝对深度图仍然具有挑战性。在本文中,我们引入了一个完全视觉上的基于注意力的深度(VADEPTH)网络,在该网络中,将空间注意力和通道注意都应用于所有阶段。通过在远距离沿空间和通道维度沿空间和通道维度的特征的依赖关系连续提取,Vadepth网络可以有效地保留重要的细节并抑制干扰特征,从而更好地感知场景结构,以获得更准确的深度估计。此外,我们利用几何先验来形成规模限制,以进行比例感知模型培训。具体而言,我们使用摄像机和由地面点拟合的平面之间的距离构建了一种新颖的规模感知损失,该平面与图像底部中间的矩形区域的像素相对应。 KITTI数据集的实验结果表明,该体系结构达到了最新性能,我们的方法可以直接输出绝对深度而无需后处理。此外,我们在Seasondepth数据集上的实验也证明了我们模型对多个看不见的环境的鲁棒性。
Although existing monocular depth estimation methods have made great progress, predicting an accurate absolute depth map from a single image is still challenging due to the limited modeling capacity of networks and the scale ambiguity issue. In this paper, we introduce a fully Visual Attention-based Depth (VADepth) network, where spatial attention and channel attention are applied to all stages. By continuously extracting the dependencies of features along the spatial and channel dimensions over a long distance, VADepth network can effectively preserve important details and suppress interfering features to better perceive the scene structure for more accurate depth estimates. In addition, we utilize geometric priors to form scale constraints for scale-aware model training. Specifically, we construct a novel scale-aware loss using the distance between the camera and a plane fitted by the ground points corresponding to the pixels of the rectangular area in the bottom middle of the image. Experimental results on the KITTI dataset show that this architecture achieves the state-of-the-art performance and our method can directly output absolute depth without post-processing. Moreover, our experiments on the SeasonDepth dataset also demonstrate the robustness of our model to multiple unseen environments.