Relmobnet：使用强大的两阶段训练的端到端相对相机姿势估算

论文标题

Relmobnet：使用强大的两阶段训练的端到端相对相机姿势估算

RelMobNet: End-to-end relative camera pose estimation using a robust two-stage training

论文作者

Rajendran, Praveen Kumar, Mishra, Sumit, Vecchietti, Luiz Felipe, Har, Dongsoo

论文摘要

相对摄像头姿势估计，即使用在不同位置拍摄的一对图像来估算翻译和旋转向量，是增强现实和机器人技术中系统的重要组成部分。在本文中，我们使用独立于相机参数的暹罗体系结构提出了端到端的相对摄像头姿势估计网络。使用剑桥地标数据和四个单独的场景数据集和一个结合四个场景的数据集对网络进行培训。为了改善概括，我们提出了一种新颖的两阶段训练，以减轻超参数以平衡翻译和旋转损失量表的需求。将提出的方法与基于CNN的一阶段训练方法（例如RPNET和RCPNET）进行了比较，并证明该模型在Kings College，Old Hospital和St Marys教堂的场景中分别将翻译矢量估计的估计分别提高了16.11％，28.88％和52.27％。为了证明纹理不变性，我们使用生成的对抗网络研究了提出的方法的概括，将数据集扩大到不同场景样式，作为消融研究。此外，我们对网络预测和地面真相构成的异性线进行定性评估。

Relative camera pose estimation, i.e. estimating the translation and rotation vectors using a pair of images taken in different locations, is an important part of systems in augmented reality and robotics. In this paper, we present an end-to-end relative camera pose estimation network using a siamese architecture that is independent of camera parameters. The network is trained using the Cambridge Landmarks data with four individual scene datasets and a dataset combining the four scenes. To improve generalization, we propose a novel two-stage training that alleviates the need of a hyperparameter to balance the translation and rotation loss scale. The proposed method is compared with one-stage training CNN-based methods such as RPNet and RCPNet and demonstrate that the proposed model improves translation vector estimation by 16.11%, 28.88%, and 52.27% on the Kings College, Old Hospital, and St Marys Church scenes, respectively. For proving texture invariance, we investigate the generalization of the proposed method augmenting the datasets to different scene styles, as ablation studies, using generative adversarial networks. Also, we present a qualitative assessment of epipolar lines of our network predictions and ground truth poses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题