论文标题
全面无监督图像检索的自制一致量化
Self-Supervised Consistent Quantization for Fully Unsupervised Image Retrieval
论文作者
论文摘要
无监督的图像检索旨在学习有效的检索系统而无需昂贵的数据注释,但是大多数现有方法都严重依赖于手工制作的功能描述符或预训练的功能提取器。为了最大程度地减少人类的监督,最近提出了深度无监督的图像检索,旨在训练从头开始的深层模型,以共同优化视觉特征和量化代码。但是,现有方法主要集中于实例对比学习,而无需考虑基本的语义结构信息,从而导致了次优的性能。在这项工作中,我们提出了一种新型的自我监督一致的量化方法,以深度无监督的图像检索,该方法由一致的零件量化和全局一致的量化组成。在部分一致的量化中,我们使用CodeWord多样性正规化设计了部分邻居语义一致性学习。这允许将基础化表示的基本邻居结构信息作为自学意义。在全球一致的量化中,我们对嵌入和量化表示形式采用对比度学习,并将这些表示形式融合在一起,以在实例之间进行一致的对比度正规化。这可以弥补量化过程中有用表示信息的丢失,并在实例之间正规化一致性。我们的方法具有统一的学习目标和全球一致的量化,利用了更丰富的自我划分线索来促进模型学习。在三个基准数据集上进行的广泛实验表明,我们的方法优于最先进的方法。
Unsupervised image retrieval aims to learn an efficient retrieval system without expensive data annotations, but most existing methods rely heavily on handcrafted feature descriptors or pre-trained feature extractors. To minimize human supervision, recent advance proposes deep fully unsupervised image retrieval aiming at training a deep model from scratch to jointly optimize visual features and quantization codes. However, existing approach mainly focuses on instance contrastive learning without considering underlying semantic structure information, resulting in sub-optimal performance. In this work, we propose a novel self-supervised consistent quantization approach to deep fully unsupervised image retrieval, which consists of part consistent quantization and global consistent quantization. In part consistent quantization, we devise part neighbor semantic consistency learning with codeword diversity regularization. This allows to discover underlying neighbor structure information of sub-quantized representations as self-supervision. In global consistent quantization, we employ contrastive learning for both embedding and quantized representations and fuses these representations for consistent contrastive regularization between instances. This can make up for the loss of useful representation information during quantization and regularize consistency between instances. With a unified learning objective of part and global consistent quantization, our approach exploits richer self-supervision cues to facilitate model learning. Extensive experiments on three benchmark datasets show the superiority of our approach over the state-of-the-art methods.