边缘引导的gan具有对比度学习的语义图像综合

论文标题

边缘引导的gan具有对比度学习的语义图像综合

Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis

论文作者

Tang, Hao, Qi, Xiaojuan, Sun, Guolei, Xu, Dan, Sebe, Nicu, Timofte, Radu, Van Gool, Luc

论文摘要

我们为具有挑战性的语义图像综合任务提出了一种新颖的Ecgan。尽管已经取得了可观的改进，但由于三个在很大程度上未解决的挑战，合成图像的质量远非令人满意。 1）语义标签没有提供详细的结构信息，因此很难综合本地细节和结构。 2）广泛采用的CNN操作，例如卷积，下采样和归一化通常会导致空间分辨率损失，因此无法完全保留原始的语义信息，从而导致语义上不一致的结果。 3）现有的语义图像合成方法着重于对单个输入语义布局进行局部语义信息进行建模。但是，他们忽略了多个输入语义布局的全局语义信息，即在不同输入布局跨像素之间的语义交叉相关性。为了解决1），我们建议将边缘用作中间表示，进一步采用，以通过提议的注意力指导边缘传输模块指导图像生成。边缘信息由卷积发生器生成，并引入详细的结构信息。为了解决2），我们设计一个有效的模块，根据原始语义布局选择性地突出相关的特征图，以保留语义信息。要解决3），受到对比学习的当前方法的启发，我们提出了一种新颖的对比学习方法，该方法旨在强制执行属于同一语义类别的像素嵌入，以产生比来自不同类别的图像内容更相似的图像内容。这样做可以通过明确探索来自多个输入语义布局标记的像素的结构来捕获更多的语义关系。在三个具有挑战性的数据集上进行的实验表明，我们的ECGAN取得的成果明显好于最先进的方法。

We propose a novel ECGAN for the challenging semantic image synthesis task. Although considerable improvement has been achieved, the quality of synthesized images is far from satisfactory due to three largely unresolved challenges. 1) The semantic labels do not provide detailed structural information, making it difficult to synthesize local details and structures. 2) The widely adopted CNN operations such as convolution, down-sampling, and normalization usually cause spatial resolution loss and thus cannot fully preserve the original semantic information, leading to semantically inconsistent results. 3) Existing semantic image synthesis methods focus on modeling local semantic information from a single input semantic layout. However, they ignore global semantic information of multiple input semantic layouts, i.e., semantic cross-relations between pixels across different input layouts. To tackle 1), we propose to use edge as an intermediate representation which is further adopted to guide image generation via a proposed attention guided edge transfer module. Edge information is produced by a convolutional generator and introduces detailed structure information. To tackle 2), we design an effective module to selectively highlight class-dependent feature maps according to the original semantic layout to preserve the semantic information. To tackle 3), inspired by current methods in contrastive learning, we propose a novel contrastive learning method, which aims to enforce pixel embeddings belonging to the same semantic class to generate more similar image content than those from different classes. Doing so can capture more semantic relations by explicitly exploring the structures of labeled pixels from multiple input semantic layouts. Experiments on three challenging datasets show that our ECGAN achieves significantly better results than state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题