语义引导的表示自我监督的单眼训练深度估计的增强

论文标题

语义引导的表示自我监督的单眼训练深度估计的增强

Semantic-Guided Representation Enhancement for Self-supervised Monocular Trained Depth Estimation

论文作者

Li, Rui, Mao, Qing, Wang, Pei, He, Xiantuo, Zhu, Yu, Sun, Jinqiu, Zhang, Yanning

论文摘要

自我监督的深度估计表明，它在仅给出图像序列作为输入的情况下，在产生高质量深度图方面具有出色的有效性。但是，由于深度表示能力有限，其性能通常在估算边界区域或薄结构对象时会下降。在本文中，我们通过提出语义引导的深度表示增强方法来解决此问题，该方法通过利用丰富的上下文信息来促进本地和全球深度特征表示。代替常规范式中使用的单个深度网络，我们提出了一个额外的语义分割分支，以提供深度估计的额外上下文特征。基于此框架，我们通过采样和馈送基于点的特征来增强局部特征表示，这些特征将基于点的特征定位在单个语义引导的边缘增强模块（SEM）上，该模块是专门设计的，旨在促进对挑战性语义界的深度估计。然后，我们通过提出一种语义引导的多层次注意机制来改善全局特征表示，从而通过探索多级深度解码方案中的像素的相关性来增强语义和深度特征。广泛的实验验证了我们方法在捕获高度准确的图像区域（例如语义类别边界和薄物体）上的高度准确深度。 Kitti上的定量和定性实验都表明，我们的方法的表现优于最新方法。

Self-supervised depth estimation has shown its great effectiveness in producing high quality depth maps given only image sequences as input. However, its performance usually drops when estimating on border areas or objects with thin structures due to the limited depth representation ability. In this paper, we address this problem by proposing a semantic-guided depth representation enhancement method, which promotes both local and global depth feature representations by leveraging rich contextual information. In stead of a single depth network as used in conventional paradigms, we propose an extra semantic segmentation branch to offer extra contextual features for depth estimation. Based on this framework, we enhance the local feature representation by sampling and feeding the point-based features that locate on the semantic edges to an individual Semantic-guided Edge Enhancement module (SEEM), which is specifically designed for promoting depth estimation on the challenging semantic borders. Then, we improve the global feature representation by proposing a semantic-guided multi-level attention mechanism, which enhances the semantic and depth features by exploring pixel-wise correlations in the multi-level depth decoding scheme. Extensive experiments validate the distinct superiority of our method in capturing highly accurate depth on the challenging image areas such as semantic category borders and thin objects. Both quantitative and qualitative experiments on KITTI show that our method outperforms the state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题