论文标题
分辨率 - 对称立体声
Degradation-agnostic Correspondence from Resolution-asymmetric Stereo
论文作者
论文摘要
在本文中,我们研究了一对具有不同分辨率的图像,例如使用电视范围摄像机系统获取的图像的立体声匹配问题。由于难以在不同的现实世界中获得地面差异标签,因此我们从无监督的学习角度开始。但是,由于两种视图之间未知降解引起的分辨率不对称性阻碍了普遍假定的光度一致性的有效性。为了克服这一挑战,我们建议在特征空间而不是图像空间中强加两个视图之间的一致性,即特征 - 金属一致性。有趣的是,我们发现,尽管使用光度损失训练的立体声匹配网络并不是最佳的,但其特征提取器可以产生降解 - 不平衡和匹配特定的特征。然后可以利用这些功能来制定特征损失,以避免光度不一致。此外,我们引入了一种自我增强策略,以逐步优化特征提取器,从而进一步增强了特征 - 金属的一致性。在两个模拟数据集上进行各种降解和自我收集的现实世界数据集的实验验证了所提出的方法的出色性能,而不是现有解决方案。
In this paper, we study the problem of stereo matching from a pair of images with different resolutions, e.g., those acquired with a tele-wide camera system. Due to the difficulty of obtaining ground-truth disparity labels in diverse real-world systems, we start from an unsupervised learning perspective. However, resolution asymmetry caused by unknown degradations between two views hinders the effectiveness of the generally assumed photometric consistency. To overcome this challenge, we propose to impose the consistency between two views in a feature space instead of the image space, named feature-metric consistency. Interestingly, we find that, although a stereo matching network trained with the photometric loss is not optimal, its feature extractor can produce degradation-agnostic and matching-specific features. These features can then be utilized to formulate a feature-metric loss to avoid the photometric inconsistency. Moreover, we introduce a self-boosting strategy to optimize the feature extractor progressively, which further strengthens the feature-metric consistency. Experiments on both simulated datasets with various degradations and a self-collected real-world dataset validate the superior performance of the proposed method over existing solutions.