论文标题
信息理论用于零拍的跨模式检索
Information-Theoretic Hashing for Zero-Shot Cross-Modal Retrieval
论文作者
论文摘要
零射击跨模式检索(ZS-CMR)处理了来自看不见类别的异源数据之间的检索问题。通常,为了确保概括,使用自然语言处理(NLP)模型的预定义类嵌入方式用于构建公共空间。在本文中,我们考虑了一种完全不同的方法来从信息理论的角度考虑构造(或学习)通用锤击空间的完全不同的方法,而不是使用额外的NLP模型来定义一个公共空间。我们将模型称为信息理论哈希(ITH),它由两个级联模块组成:一个自适应信息聚合(AIA)模块;和语义保存编码(SPE)模块。具体而言,我们的AIA模块从相关信息的原理(PRI)中汲取灵感来构建一个共同的空间,该空间可适应地汇总了不同数据模式的固有语义,并滤除了冗余或无关的信息。另一方面,我们的SPE模块通过保留固有语义与元素的Kullback-Leibler(KL)Divergence的相似性,进一步生成了不同模态的哈希代码。还施加了总相关性术语,以减少哈希码不同维度之间的冗余。在三个基准数据集上进行了足够的实验证明了ZS-CMR中提出的ITH的优势。源代码在补充材料中可用。
Zero-shot cross-modal retrieval (ZS-CMR) deals with the retrieval problem among heterogenous data from unseen classes. Typically, to guarantee generalization, the pre-defined class embeddings from natural language processing (NLP) models are used to build a common space. In this paper, instead of using an extra NLP model to define a common space beforehand, we consider a totally different way to construct (or learn) a common hamming space from an information-theoretic perspective. We term our model the Information-Theoretic Hashing (ITH), which is composed of two cascading modules: an Adaptive Information Aggregation (AIA) module; and a Semantic Preserving Encoding (SPE) module. Specifically, our AIA module takes the inspiration from the Principle of Relevant Information (PRI) to construct a common space that adaptively aggregates the intrinsic semantics of different modalities of data and filters out redundant or irrelevant information. On the other hand, our SPE module further generates the hashing codes of different modalities by preserving the similarity of intrinsic semantics with the element-wise Kullback-Leibler (KL) divergence. A total correlation regularization term is also imposed to reduce the redundancy amongst different dimensions of hash codes. Sufficient experiments on three benchmark datasets demonstrate the superiority of the proposed ITH in ZS-CMR. Source code is available in the supplementary material.