使用相机姿势监督学习功能描述符

论文标题

使用相机姿势监督学习功能描述符

Learning Feature Descriptors using Camera Pose Supervision

论文作者

Wang, Qianqian, Zhou, Xiaowei, Hariharan, Bharath, Snavely, Noah

论文摘要

对学习视觉描述符的最新研究表明，对应估计值有望改善，这是许多3D视觉任务的关键组成部分。但是，现有的描述符学习框架通常需要培训特征点之间的基础真相对应关系，这是在大规模上获得的挑战。在本文中，我们提出了一个新颖的弱监督框架，该框架可以仅从图像之间的相对相机姿势中学习特征描述符。为此，我们既设计了一个新的损失函数，该功能利用了相机姿势给出的外两极约束，又是一个新的模型体系结构，使整个管道可区分和高效。因为我们不再需要像素级基础真相的对应关系，所以我们的框架打开了在更大，更多样化的数据集中培训的可能性，以获得更好和无偏见的描述符。我们将结果摄像头姿势施加了监督或帽子，描述符。尽管受过较弱的监督培训，但CAPS的描述符甚至超过了先前的全面监督描述符，并且在各种几何任务上实现了最先进的表现。项目页面：https：//qianqianwang68.github.io/caps/

Recent research on learned visual descriptors has shown promising improvements in correspondence estimation, a key component of many 3D vision tasks. However, existing descriptor learning frameworks typically require ground-truth correspondences between feature points for training, which are challenging to acquire at scale. In this paper we propose a novel weakly-supervised framework that can learn feature descriptors solely from relative camera poses between images. To do so, we devise both a new loss function that exploits the epipolar constraint given by camera poses, and a new model architecture that makes the whole pipeline differentiable and efficient. Because we no longer need pixel-level ground-truth correspondences, our framework opens up the possibility of training on much larger and more diverse datasets for better and unbiased descriptors. We call the resulting descriptors CAmera Pose Supervised, or CAPS, descriptors. Though trained with weak supervision, CAPS descriptors outperform even prior fully-supervised descriptors and achieve state-of-the-art performance on a variety of geometric tasks. Project Page: https://qianqianwang68.github.io/CAPS/

下载PDF全文

下载文献需遵守相关版权规定

论文标题