论文标题

您需要的位置信息是:从视频中进行自我监督的SVDE的新颖管道

Positional Information is All You Need: A Novel Pipeline for Self-Supervised SVDE from Videos

论文作者

Bello, Juan Luis Gonzalez, Moon, Jaeho, Kim, Munchurl

论文摘要

最近,人们引起了很多关注,以完全自欺欺人的方式从单眼视频中学习场景的基本3D结构。该任务最具挑战性的方面之一是在破坏刚性的假设时处理独立移动的对象。我们首次表明可以利用像素位置信息从视频中学习SVDE(单视深度估计)。我们提出的移动对象(MO)掩码是由移动的位置信息(SPI)引起的,称为“ Spimo”掩码,非常强大,并且一致地删除了场景中独立移动的对象,从而可以从视频中更好地学习SVDE。此外,我们引入了一种新的自适应量化方案,该方案为我们的深度离散化分配了最佳的人均量化曲线。最后,我们采用一种新的方式采用现有的增强技术来进一步自我避免移动对象的深度。借助这些功能,我们的管道可抵抗移动对象的强大功能,并将其推广到高分辨率图像,即使经过小斑块训练,产生了最新的(SOTA)结果,其参数比以前从视频中学习的作品少了8.5倍。我们介绍了有关方法显示我们方法有效性的Kitti和CityScapes的广泛实验。

Recently, much attention has been drawn to learning the underlying 3D structures of a scene from monocular videos in a fully self-supervised fashion. One of the most challenging aspects of this task is handling the independently moving objects as they break the rigid-scene assumption. For the first time, we show that pixel positional information can be exploited to learn SVDE (Single View Depth Estimation) from videos. Our proposed moving object (MO) masks, which are induced by shifted positional information (SPI) and referred to as `SPIMO' masks, are very robust and consistently remove the independently moving objects in the scenes, allowing for better learning of SVDE from videos. Additionally, we introduce a new adaptive quantization scheme that assigns the best per-pixel quantization curve for our depth discretization. Finally, we employ existing boosting techniques in a new way to further self-supervise the depth of the moving objects. With these features, our pipeline is robust against moving objects and generalizes well to high-resolution images, even when trained with small patches, yielding state-of-the-art (SOTA) results with almost 8.5x fewer parameters than the previous works that learn from videos. We present extensive experiments on KITTI and CityScapes that show the effectiveness of our method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源