论文标题
PixSelect:更少但可靠的像素,以进行准确有效的本地化
PixSelect: Less but Reliable Pixels for Accurate and Efficient Localization
论文作者
论文摘要
准确的相机姿势估计是许多应用程序(例如自动驾驶,移动机器人技术和增强现实)的基本要求。在这项工作中,我们解决了从给定环境中单个RGB图像估算全局6 DOF摄像头的问题。以前的作品考虑图像的每个部分对于本地化很有价值。但是,许多图像区域,例如天空,遮挡和重复的非固定模式,不能用于本地化。除了添加不必要的计算工作之外,从此类地区提取和匹配功能还会产生许多错误的匹配,从而降低了本地化准确性和效率。我们的工作解决了这个特定问题,并通过利用有趣的稀疏3D模型概念来显示,我们可以利用歧视性环境零件并避免出于单个图像本地化而避免无用的图像区域。有趣的是,通过避免从树木,灌木丛,汽车,行人和遮挡等不可靠的图像区域中选择关键点,我们的工作自然而然地作为一个离群的过滤器。这使我们的系统高效,在最小的对应关系中,由于异常值的数量很少,因此需要高度准确。我们的工作超过了室外剑桥地标数据集的最新方法。仅依靠推理依靠单个图像,它的精度方法超过了姿势先验和/或参考3D模型的同时更快的速度。通过选择仅100个对应关系,它超过了从数千个对应关系进行定位的类似方法,同时更有效。特别是,与这些方法相比,它实现了,在Oldhospital场景中,本地化提高了33%。此外,它甚至可以从图像顺序中学习的直接姿势回归器
Accurate camera pose estimation is a fundamental requirement for numerous applications, such as autonomous driving, mobile robotics, and augmented reality. In this work, we address the problem of estimating the global 6 DoF camera pose from a single RGB image in a given environment. Previous works consider every part of the image valuable for localization. However, many image regions such as the sky, occlusions, and repetitive non-distinguishable patterns cannot be utilized for localization. In addition to adding unnecessary computation efforts, extracting and matching features from such regions produce many wrong matches which in turn degrades the localization accuracy and efficiency. Our work addresses this particular issue and shows by exploiting an interesting concept of sparse 3D models that we can exploit discriminatory environment parts and avoid useless image regions for the sake of a single image localization. Interestingly, through avoiding selecting keypoints from non-reliable image regions such as trees, bushes, cars, pedestrians, and occlusions, our work acts naturally as an outlier filter. This makes our system highly efficient in that minimal set of correspondences is needed and highly accurate as the number of outliers is low. Our work exceeds state-ofthe-art methods on outdoor Cambridge Landmarks dataset. With only relying on single image at inference, it outweighs in terms of accuracy methods that exploit pose priors and/or reference 3D models while being much faster. By choosing as little as 100 correspondences, it surpasses similar methods that localize from thousands of correspondences, while being more efficient. In particular, it achieves, compared to these methods, an improvement of localization by 33% on OldHospital scene. Furthermore, It outstands direct pose regressors even those that learn from sequence of images