论文标题
从单眼图像中准确重建3D场景形状
Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image
论文作者
论文摘要
尽管在过去几年中取得了重大进展,但使用单眼图像进行深度估计仍然存在挑战。首先,训练度量深度预测模型的训练是不平凡的,该预测模型可以很好地推广到主要由于训练数据有限的不同场景。因此,研究人员建立了更容易收集的大规模相对深度数据集。但是,现有的相对深度估计模型通常无法恢复准确的3D场景形状,这是由于使用相对深度数据训练引起的深度偏移而导致的。我们在这里解决了这个问题,并试图通过对大规模相对深度数据进行训练并估算深度变化来估算准确的场景形状。为此,我们提出了一个两阶段的框架,该框架首先将深度预测到未知量表并从单一单眼图像转移,然后利用3D点云数据以预测深度偏移和相机的焦距,使我们能够恢复3D场景形状。由于两个模块是分别训练的,因此我们不需要严格配对的培训数据。此外,我们提出了图像级的归一化回归损失和基于正常的几何损失,以通过相对深度注释来改善训练。我们在九个看不见的数据集上测试了我们的深度模型,并在零摄影评估上实现最先进的性能。代码可在以下网址找到:https://git.io/depth
Despite significant progress made in the past few years, challenges remain for depth estimation using a single monocular image. First, it is nontrivial to train a metric-depth prediction model that can generalize well to diverse scenes mainly due to limited training data. Thus, researchers have built large-scale relative depth datasets that are much easier to collect. However, existing relative depth estimation models often fail to recover accurate 3D scene shapes due to the unknown depth shift caused by training with the relative depth data. We tackle this problem here and attempt to estimate accurate scene shapes by training on large-scale relative depth data, and estimating the depth shift. To do so, we propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes. As the two modules are trained separately, we do not need strictly paired training data. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to improve training with relative depth annotation. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation. Code is available at: https://git.io/Depth