DENSERNET：使用多尺度特征聚合的弱监督的视觉定位

论文标题

DENSERNET：使用多尺度特征聚合的弱监督的视觉定位

DenserNet: Weakly Supervised Visual Localization Using Multi-scale Feature Aggregation

论文作者

Liu, Dongfang, Cui, Yiming, Yan, Liqi, Mousas, Christos, Yang, Baijian, Chen, Yingjie

论文摘要

在这项工作中，我们引入了一个较密集的功能网络（eNSernet），以进行视觉定位。我们的工作提供了三个主要贡献。首先，我们开发了一个卷积神经网络（CNN）体系结构，该架构汇总在不同语义级别的图像表示图像表示。使用密集的特征图，我们的方法可以产生更多的关键点特征并提高图像检索精度。其次，除了正面和负GPS标记的图像对以外，我们的模型是端到端训练的，没有像素级注释。我们使用弱监督的三胞胎排名损失来学习判别特征，并鼓励关键点功能可重复性用于图像表示。最后，我们的方法在计算上是有效的，因为我们的体系结构在计算过程中具有共享的特征和参数。我们的方法可以在挑战性条件下进行准确的大规模定位，同时保持计算约束。广泛的实验结果表明，我们的方法为四个具有挑战性的大规模定位基准和三个图像检索基准设定了新的最新技术。

In this work, we introduce a Denser Feature Network (DenserNet) for visual localization. Our work provides three principal contributions. First, we develop a convolutional neural network (CNN) architecture which aggregates feature maps at different semantic levels for image representations. Using denser feature maps, our method can produce more keypoint features and increase image retrieval accuracy. Second, our model is trained end-to-end without pixel-level annotation other than positive and negative GPS-tagged image pairs. We use a weakly supervised triplet ranking loss to learn discriminative features and encourage keypoint feature repeatability for image representation. Finally, our method is computationally efficient as our architecture has shared features and parameters during computation. Our method can perform accurate large-scale localization under challenging conditions while remaining the computational constraint. Extensive experiment results indicate that our method sets a new state-of-the-art on four challenging large-scale localization benchmarks and three image retrieval benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题