论文标题
迈向自动驾驶应用的分层自我监督的单眼绝对深度估计
Toward Hierarchical Self-Supervised Monocular Absolute Depth Estimation for Autonomous Driving Applications
论文作者
论文摘要
近年来,单眼深度估计的自我监督方法已迅速成为深度估计任务的重要分支,尤其是对于自主驾驶应用程序。尽管达到了高度的高度,但当前方法仍然遭受a)不精确对象级深度推断和b)不确定的尺度因子。前者的问题将导致纹理副本或提供不准确的对象边界,后者将需要当前方法具有额外的传感器,例如LIDAR,以作为额外的训练输入提供深度地面真相或立体声摄像头,这使得它们难以实现。在这项工作中,我们建议通过引入DNET一起解决这两个问题。我们的贡献是双重的:a)提出了一个新型密集连接的预测(DCP)层,以提供更好的对象级深度估计,b)专门针对自主驾驶场景,引入密集的几何约束(DGC),因此可以在无需自动驾驶的额外成本的情况下恢复精确的尺度因子。已经进行了广泛的实验,并且证明DCP层和DGC模块分别有效地解决了上述问题。多亏了DCP层,现在可以更好地区分对象边界在深度图中,并且深度在对象级别上更加继续。还证明,使用DGC执行比例恢复的性能与使用地面真实信息(给出摄像头高度且接地点占1.03 \%的像素的1.03 \%)的性能相当。代码可在https://github.com/tj-iplab/dnet上找到。
In recent years, self-supervised methods for monocular depth estimation has rapidly become an significant branch of depth estimation task, especially for autonomous driving applications. Despite the high overall precision achieved, current methods still suffer from a) imprecise object-level depth inference and b) uncertain scale factor. The former problem would cause texture copy or provide inaccurate object boundary, and the latter would require current methods to have an additional sensor like LiDAR to provide depth ground-truth or stereo camera as additional training inputs, which makes them difficult to implement. In this work, we propose to address these two problems together by introducing DNet. Our contributions are twofold: a) a novel dense connected prediction (DCP) layer is proposed to provide better object-level depth estimation and b) specifically for autonomous driving scenarios, dense geometrical constrains (DGC) is introduced so that precise scale factor can be recovered without additional cost for autonomous vehicles. Extensive experiments have been conducted and, both DCP layer and DGC module are proved to be effectively solving the aforementioned problems respectively. Thanks to DCP layer, object boundary can now be better distinguished in the depth map and the depth is more continues on object level. It is also demonstrated that the performance of using DGC to perform scale recovery is comparable to that using ground-truth information, when the camera height is given and the ground point takes up more than 1.03\% of the pixels. Code is available at https://github.com/TJ-IPLab/DNet.