论文标题

通过Boltzmann机器增强了生成语义哈希

Generative Semantic Hashing Enhanced via Boltzmann Machines

论文作者

Zheng, Lin, Su, Qinliang, Shen, Dinghan, Chen, Changyou

论文摘要

生成语义散列是一种有前途的技术,可用于大规模信息检索,这要归功于其快速检索速度和较小的内存足迹。对于训练的疗程,现有的生成锤方法主要采用后验分布的分解形式,从而在哈希码的位之间执行独立性。从模型表示和代码空间大小的角度来看,独立始终不是最佳假设。在本文中,为了引入哈希代码的位之间的相关性,我们建议将玻尔兹曼机器的分布作为变异后部。为了解决训练的棘手性问题,我们首先开发了一种近似方法,通过将其作为高斯样分布和伯努利分布的层次结合串联来重新聚集Boltzmann机器的分布。基于此,对于较低的证据(ELBO),进一步得出了渐近脱离的下限。借助这些新型技术,可以有效地优化整个模型。广泛的实验结果表明,通过有效地建模哈希代码中不同位之间的相关性,我们的模型可以实现显着的性能提高。

Generative semantic hashing is a promising technique for large-scale information retrieval thanks to its fast retrieval speed and small memory footprint. For the tractability of training, existing generative-hashing methods mostly assume a factorized form for the posterior distribution, enforcing independence among the bits of hash codes. From the perspectives of both model representation and code space size, independence is always not the best assumption. In this paper, to introduce correlations among the bits of hash codes, we propose to employ the distribution of Boltzmann machine as the variational posterior. To address the intractability issue of training, we first develop an approximate method to reparameterize the distribution of a Boltzmann machine by augmenting it as a hierarchical concatenation of a Gaussian-like distribution and a Bernoulli distribution. Based on that, an asymptotically-exact lower bound is further derived for the evidence lower bound (ELBO). With these novel techniques, the entire model can be optimized efficiently. Extensive experimental results demonstrate that by effectively modeling correlations among different bits within a hash code, our model can achieve significant performance gains.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源