具有真实图像的单眼深度和散焦估计的多任务学习

论文标题

具有真实图像的单眼深度和散焦估计的多任务学习

Multi-task Learning for Monocular Depth and Defocus Estimations with Real Images

论文作者

He, Renzhi, Hong, Hualin, Fu, Boya, Liu, Fei

论文摘要

单眼深度估计和散焦估计是计算机视觉中的两个基本任务。大多数现有方法将深度估计和散焦估计视为两个独立的任务，而忽略了它们之间的牢固联系。在这项工作中，我们提出了一个由编码器组成的多任务学习网络，该网络具有两个解码器，以估算单个集中图像的深度和散焦图。通过多任务网络，深度估计有助于散热估计，从而在弱纹理区域中获得更好的结果，而散焦估计有助于通过两个地图之间的牢固的物理连接来促进深度估计。我们设置了一个数据集（名为All-3D数据集），该数据集是第一个由100K的全焦点图像组组成的全真实图像数据集，具有焦点深度，深度图和Defocus映射的集中图像。它使网络能够学习深度和真实散热器图像之间的功能和固体物理连接。实验表明，与综合集中图像相比，网络从真实集中图像中学习更多的可实现特征。从这种多任务结构中受益，不同的任务相互促进，我们的深度和散焦估计的性能明显优于其他最新算法。代码和数据集将在https://github.com/cubhe/mddnet上公开可用。

Monocular depth estimation and defocus estimation are two fundamental tasks in computer vision. Most existing methods treat depth estimation and defocus estimation as two separate tasks, ignoring the strong connection between them. In this work, we propose a multi-task learning network consisting of an encoder with two decoders to estimate the depth and defocus map from a single focused image. Through the multi-task network, the depth estimation facilitates the defocus estimation to get better results in the weak texture region and the defocus estimation facilitates the depth estimation by the strong physical connection between the two maps. We set up a dataset (named ALL-in-3D dataset) which is the first all-real image dataset consisting of 100K sets of all-in-focus images, focused images with focus depth, depth maps, and defocus maps. It enables the network to learn features and solid physical connections between the depth and real defocus images. Experiments demonstrate that the network learns more solid features from the real focused images than the synthetic focused images. Benefiting from this multi-task structure where different tasks facilitate each other, our depth and defocus estimations achieve significantly better performance than other state-of-art algorithms. The code and dataset will be publicly available at https://github.com/cubhe/MDDNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题