信息瓶颈约束潜在双向嵌入以进行零拍学习

论文标题

信息瓶颈约束潜在双向嵌入以进行零拍学习

Information Bottleneck Constrained Latent Bidirectional Embedding for Zero-Shot Learning

论文作者

Liu, Yang, Zhou, Lei, Bai, Xiao, Gu, Lin, Harada, Tatsuya, Zhou, Jun

论文摘要

零射门学习（ZSL）旨在通过将语义知识从可见的类转移到看不见的班级来识别新课程。尽管许多ZSL方法依赖于视觉和语义空间之间的直接映射，但校准偏差和中心问题问题限制了看不见类的概括能力。最近出现的生成ZSL方法生成了看不见的图像特征，以将ZSL转换为监督分类问题。但是，大多数生成模型仍然遭受观察到的偏见问题的困扰，因为仅使用看到的数据用于培训。为了解决这些问题，我们提出了一个新型的基于双向嵌入的生成模型，并具有紧密的视觉语义耦合约束。我们学习一个统一的潜在空间，该空间可以校准视觉和语义空间的嵌入式参数分布。由于高维视觉特征的嵌入包含了许多非语义信息，因此潜在空间中视觉和语义的对齐不可避免地会被偏离。因此，我们首次向ZSL介绍了信息瓶颈（IB）约束，以保留映射期间的基本属性信息。具体而言，我们利用不确定性估计和尾流程序来减轻特征噪声并提高模型抽象能力。此外，我们的方法可以通过为看不见的图像生成标签来轻松扩展到转导ZSL设置。然后，我们引入了强大的损失来解决此标签噪声问题。广泛的实验结果表明，我们的方法在大多数基准数据集上的不同ZSL设置中优于最先进的方法。该代码将在https://github.com/osierboy/ibzsl上找到。

Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen classes. Though many ZSL methods rely on a direct mapping between the visual and the semantic space, the calibration deviation and hubness problem limit the generalization capability to unseen classes. Recently emerged generative ZSL methods generate unseen image features to transform ZSL into a supervised classification problem. However, most generative models still suffer from the seen-unseen bias problem as only seen data is used for training. To address these issues, we propose a novel bidirectional embedding based generative model with a tight visual-semantic coupling constraint. We learn a unified latent space that calibrates the embedded parametric distributions of both visual and semantic spaces. Since the embedding from high-dimensional visual features comprise much non-semantic information, the alignment of visual and semantic in latent space would inevitably been deviated. Therefore, we introduce information bottleneck (IB) constraint to ZSL for the first time to preserve essential attribute information during the mapping. Specifically, we utilize the uncertainty estimation and the wake-sleep procedure to alleviate the feature noises and improve model abstraction capability. In addition, our method can be easily extended to transductive ZSL setting by generating labels for unseen images. We then introduce a robust loss to solve this label noise problem. Extensive experimental results show that our method outperforms the state-of-the-art methods in different ZSL settings on most benchmark datasets. The code will be available at https://github.com/osierboy/IBZSL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题