使用相对姿势回归评估相机重新定位的见解

论文标题

使用相对姿势回归评估相机重新定位的见解

Insights on Evaluation of Camera Re-localization Using Relative Pose Regression

论文作者

Shalev, Amir, Achrack, Omer, Fulkerson, Brian, Bobrovsky, Ben-Zion

论文摘要

我们考虑视觉重新定位中相对姿势回归的问题。最近，该领域已经出现了几种有前途的方法。我们声称，即使他们使用相同的拆分进行训练和测试在同一数据集上演示，但它们之间的忠实比较是由于当前使用的评估指标，但某些方法可能会表现出色，而实际上表现较差。我们揭示了准确性与回归子空间的3D体积之间的权衡。我们认为，与其他重新定位方法不同，在相对姿势回归的情况下，回归的子空间3D体积较少依赖于场景，而对分数重叠的方法的影响更大，这确定了如何紧密采样的观点。我们提出了三个新指标，以纠正上述问题。拟议的指标纳入了有关回归子空间量的统计数据。我们还提出了一个新的姿势回归网络，该网络是该任务的新基线。我们将经过训练的模型在Microsoft 7片和剑桥地标数据集与标准指标和新提出的指标进行比较，并调整重叠分数，以揭示子空间和性能之间的权衡。结果表明，所提出的指标对于与常规方法不同的重叠阈值更强大。最后，我们表明我们的网络在单个场景中均能很好地概括训练，从而导致其他场景的性能丧失。

We consider the problem of relative pose regression in visual relocalization. Recently, several promising approaches have emerged in this area. We claim that even though they demonstrate on the same datasets using the same split to train and test, a faithful comparison between them was not available since on currently used evaluation metric, some approaches might perform favorably, while in reality performing worse. We reveal a tradeoff between accuracy and the 3D volume of the regressed subspace. We believe that unlike other relocalization approaches, in the case of relative pose regression, the regressed subspace 3D volume is less dependent on the scene and more affect by the method used to score the overlap, which determined how closely sampled viewpoints are. We propose three new metrics to remedy the issue mentioned above. The proposed metrics incorporate statistics about the regression subspace volume. We also propose a new pose regression network that serves as a new baseline for this task. We compare the performance of our trained model on Microsoft 7-Scenes and Cambridge Landmarks datasets both with the standard metrics and the newly proposed metrics and adjust the overlap score to reveal the tradeoff between the subspace and performance. The results show that the proposed metrics are more robust to different overlap threshold than the conventional approaches. Finally, we show that our network generalizes well, specifically, training on a single scene leads to little loss of performance on the other scenes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题