学习为零摄的组成的视觉表示

论文标题

学习为零摄的组成的视觉表示

Learning Invariant Visual Representations for Compositional Zero-Shot Learning

论文作者

Zhang, Tian, Liang, Kongming, Du, Ruoyi, Sun, Xian, Ma, Zhanyu, Guo, Jun

论文摘要

组成零射击学习（CZSL）旨在使用从训练集中的属性对象组成中学到的知识来识别新的构图。先前的工作主要将图像和组合物投影到共同的嵌入空间中，以测量其兼容性评分。但是，属性和对象都共享上面学到的视觉表示，导致模型利用虚假的相关性和对可见对的偏差。取而代之的是，我们重新考虑CZSL作为分布的概括问题。如果将对象视为域，我们可以学习对象不变的功能，以识别任何对象附加的属性。同样，当识别具有属性为域的对象时，还可以学习属性不变的功能。具体而言，我们提出了一个不变的特征学习框架，以在表示和梯度级别的不同域对齐，以捕获与任务相关的内在特征。对两个CZSL基准测试的实验表明，所提出的方法显着优于先前的最新方法。

Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen attribute-object compositions in the training set. Previous works mainly project an image and a composition into a common embedding space to measure their compatibility score. However, both attributes and objects share the visual representations learned above, leading the model to exploit spurious correlations and bias towards seen pairs. Instead, we reconsider CZSL as an out-of-distribution generalization problem. If an object is treated as a domain, we can learn object-invariant features to recognize the attributes attached to any object reliably. Similarly, attribute-invariant features can also be learned when recognizing the objects with attributes as domains. Specifically, we propose an invariant feature learning framework to align different domains at the representation and gradient levels to capture the intrinsic characteristics associated with the tasks. Experiments on two CZSL benchmarks demonstrate that the proposed method significantly outperforms the previous state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题