地图集：端到端的3D场景重建来自摆姿势的图像

论文标题

地图集：端到端的3D场景重建来自摆姿势的图像

Atlas: End-to-End 3D Scene Reconstruction from Posed Images

论文作者

Murez, Zak, van As, Tarrence, Bartolozzi, James, Sinha, Ayan, Badrinarayanan, Vijay, Rabinovich, Andrew

论文摘要

我们通过直接从一组摆姿势的RGB图像中回归截断的签名距离函数（TSDF），为场景提供了一种端到端的3D重建方法。在估计场景的完整3D模型之前，传统的3D重建方法取决于深度图的中间表示。我们假设直接回归3D更有效。 2D CNN从每个图像中独立地提取特征，然后使用摄像机内在和外部设备将其重新投影并积累到体素体积中。积累后，3D CNN完善了累积的特征并预测TSDF值。此外，在没有明显计算的情况下获得了3D模型的语义分割。在扫描仪数据集上评估了这种方法，在该数据集中，我们在定量和定性上都显着胜过最先进的基线（深度多视图立体声，然后是传统的TSDF融合）。我们将3D语义分割与使用深度传感器的先前方法进行比较，因为以前没有使用RGB输入的问题。

We present an end-to-end 3D reconstruction method for a scene by directly regressing a truncated signed distance function (TSDF) from a set of posed RGB images. Traditional approaches to 3D reconstruction rely on an intermediate representation of depth maps prior to estimating a full 3D model of a scene. We hypothesize that a direct regression to 3D is more effective. A 2D CNN extracts features from each image independently which are then back-projected and accumulated into a voxel volume using the camera intrinsics and extrinsics. After accumulation, a 3D CNN refines the accumulated features and predicts the TSDF values. Additionally, semantic segmentation of the 3D model is obtained without significant computation. This approach is evaluated on the Scannet dataset where we significantly outperform state-of-the-art baselines (deep multiview stereo followed by traditional TSDF fusion) both quantitatively and qualitatively. We compare our 3D semantic segmentation to prior methods that use a depth sensor since no previous work attempts the problem with only RGB input.

下载PDF全文

下载文献需遵守相关版权规定

论文标题