论文标题
我的深度地面真相足够好吗?锤子 - 高度准确的多模式数据集,用于密集3D场景回归
Is my Depth Ground-Truth Good Enough? HAMMER -- Highly Accurate Multi-Modal Dataset for DEnse 3D Scene Regression
论文作者
论文摘要
深度估计是3D计算机视觉中的核心任务。最近的方法研究了以各种深度传感器方式训练的单眼深度的任务。每个传感器都有其优势和缺点,这是由于估计的性质而引起的。在文献中,研究深度的平均误差大部分,通常不讨论传感器功能。但是,尤其是室内环境,对某些设备构成了挑战。无纹理区域对运动的结构构成挑战,反射材料对于主动传感是有问题的,并且半透明材料的距离很复杂,可以用现有传感器进行测量。本文提出了Hammer,这是一个数据集,该数据集包含来自多个常用传感器的深度估计,用于室内深度估计,即TOF,立体声,结构光,以及单眼RGB+P数据。我们在3D扫描仪和对齐效果图的帮助下构建了高度可靠的地面真理深度图。对这些数据和典型的深度senosor进行了流行的深度估计器。估计值对不同场景结构进行了广泛的分析。我们注意到在具有挑战但日常场景内容的家庭环境中,各种传感器技术引起的概括问题。我们公开可用的锤子为目标深度改进和传感器融合方法铺平了一个可靠的基础。
Depth estimation is a core task in 3D computer vision. Recent methods investigate the task of monocular depth trained with various depth sensor modalities. Every sensor has its advantages and drawbacks caused by the nature of estimates. In the literature, mostly mean average error of the depth is investigated and sensor capabilities are typically not discussed. Especially indoor environments, however, pose challenges for some devices. Textureless regions pose challenges for structure from motion, reflective materials are problematic for active sensing, and distances for translucent material are intricate to measure with existing sensors. This paper proposes HAMMER, a dataset comprising depth estimates from multiple commonly used sensors for indoor depth estimation, namely ToF, stereo, structured light together with monocular RGB+P data. We construct highly reliable ground truth depth maps with the help of 3D scanners and aligned renderings. A popular depth estimators is trained on this data and typical depth senosors. The estimates are extensively analyze on different scene structures. We notice generalization issues arising from various sensor technologies in household environments with challenging but everyday scene content. HAMMER, which we make publicly available, provides a reliable base to pave the way to targeted depth improvements and sensor fusion approaches.