论文标题
通过可解释的非金属盒嵌入来预测图像的视觉重叠
Predicting Visual Overlap of Images Through Interpretable Non-Metric Box Embeddings
论文作者
论文摘要
两个图像在多大程度上描绘了相同的3D表面?即使这是一个已知的场景,答案通常也需要跨音阶空间进行昂贵的搜索,并对大量本地特征进行匹配和几何验证。当对图库评估查询图像时,例如在视觉重新定位中。虽然我们没有消除对几何验证的需求,但我们提出了可解释的图像装置,该图像插入将比例空间中的搜索缩短到本质上。 我们的方法衡量了两个图像之间的不对称关系。然后,该模型从具有已知3D可见面重叠的训练示例中学习了特定场景的相似性度量。结果是我们可以快速识别,例如,哪个测试图像是另一个的特写版本,以及按何种比例因素。随后,只需要在该规模上检测到本地功能。我们通过展示这种嵌入如何产生竞争性图像匹配结果来验证我们的场景特定模型,同时更简单,更快,也可以通过人类来解释。
To what extent are two images picturing the same 3D surfaces? Even when this is a known scene, the answer typically requires an expensive search across scale space, with matching and geometric verification of large sets of local features. This expense is further multiplied when a query image is evaluated against a gallery, e.g. in visual relocalization. While we don't obviate the need for geometric verification, we propose an interpretable image-embedding that cuts the search in scale space to essentially a lookup. Our approach measures the asymmetric relation between two images. The model then learns a scene-specific measure of similarity, from training examples with known 3D visible-surface overlaps. The result is that we can quickly identify, for example, which test image is a close-up version of another, and by what scale factor. Subsequently, local features need only be detected at that scale. We validate our scene-specific model by showing how this embedding yields competitive image-matching results, while being simpler, faster, and also interpretable by humans.