与RGB-D融合的比例不变语义细分

论文标题

与RGB-D融合的比例不变语义细分

Scale Invariant Semantic Segmentation with RGB-D Fusion

论文作者

Ansari, Mohammad Dawud, Husada, Alwi, Stricker, Didier

论文摘要

在本文中，我们提出了一种使用RGB-D图像的神经网络体系结构，用于规模不变的语义分割。除了颜色图像外，我们还将深度信息用作附加方式。特别是在室外场景中，该场景由不同的比例对象组成，因为对象与相机的距离距离。近距离对象由比远的像素更大。我们建议将深度信息合并到RGB数据中，以进行像素语义分段，以解决室外场景中的不同比例对象。我们适应了众所周知的DeepLab-V2（Resnet-101）模型作为我们的RGB基线。深度图像分别作为带有不同分支的附加输入传递。使用新型融合块融合了颜色和深度图像分支的中间特征图。我们的模型是紧凑的，可以轻松地应用于其他RGB模型。我们对充满挑战的数据集CityScapes进行了广泛的定性和定量评估。获得的结果与最先进的结果相媲美。此外，我们在自录制的真实数据集上评估了我们的模型。为了用地面真相对驾驶现场进行扩展评估，我们使用流行的车辆模拟项目Carla生成了一个合成数据集。从真实和合成数据集获得的结果显示了我们方法的有效性。

In this paper, we propose a neural network architecture for scale-invariant semantic segmentation using RGB-D images. We utilize depth information as an additional modality apart from color images only. Especially in an outdoor scene which consists of different scale objects due to the distance of the objects from the camera. The near distance objects consist of significantly more pixels than the far ones. We propose to incorporate depth information to the RGB data for pixel-wise semantic segmentation to address the different scale objects in an outdoor scene. We adapt to a well-known DeepLab-v2(ResNet-101) model as our RGB baseline. Depth images are passed separately as an additional input with a distinct branch. The intermediate feature maps of both color and depth image branch are fused using a novel fusion block. Our model is compact and can be easily applied to the other RGB model. We perform extensive qualitative and quantitative evaluation on a challenging dataset Cityscapes. The results obtained are comparable to the state-of-the-art. Additionally, we evaluated our model on a self-recorded real dataset. For the shake of extended evaluation of a driving scene with ground truth we generated a synthetic dataset using popular vehicle simulation project CARLA. The results obtained from the real and synthetic dataset shows the effectiveness of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题