文本对图像综合的复发仿射转换

论文标题

文本对图像综合的复发仿射转换

Recurrent Affine Transformation for Text-to-image Synthesis

论文作者

Ye, Senmao, Liu, Fei, Tan, Minkui

论文摘要

文本对图像综合旨在生成以文本描述为条件的自然图像。该任务的主要困难在于有效地将文本信息融合到图像综合过程中。现有的方法通常将合适的文本信息自适应地融合到合成过程中，并具有多个孤立的融合块（例如，条件批处理归一化和实例归一化）。但是，孤立的融合阻滞不仅相互冲突，而且增加了训练的困难（请参阅补充的第一页）。为了解决这些问题，我们提出了一种复发性仿射转化（大鼠），用于生成对抗网络，该网络将所有融合块与经常性神经网络联系起来，以建模其长期依赖性。此外，为了提高文本和合成图像之间的语义一致性，我们将空间注意模型纳入歧视器中。要了解匹配的图像区域，文本说明会监督生成器综合更相关的图像内容。 Extensive experiments on the CUB, Oxford-102 and COCO datasets demonstrate the superiority of the proposed model in comparison to state-of-the-art models \footnote{https://github.com/senmaoy/Recurrent-Affine-Transformation-for-Text-to-image-Synthesis.git}

Text-to-image synthesis aims to generate natural images conditioned on text descriptions. The main difficulty of this task lies in effectively fusing text information into the image synthesis process. Existing methods usually adaptively fuse suitable text information into the synthesis process with multiple isolated fusion blocks (e.g., Conditional Batch Normalization and Instance Normalization). However, isolated fusion blocks not only conflict with each other but also increase the difficulty of training (see first page of the supplementary). To address these issues, we propose a Recurrent Affine Transformation (RAT) for Generative Adversarial Networks that connects all the fusion blocks with a recurrent neural network to model their long-term dependency. Besides, to improve semantic consistency between texts and synthesized images, we incorporate a spatial attention model in the discriminator. Being aware of matching image regions, text descriptions supervise the generator to synthesize more relevant image contents. Extensive experiments on the CUB, Oxford-102 and COCO datasets demonstrate the superiority of the proposed model in comparison to state-of-the-art models \footnote{https://github.com/senmaoy/Recurrent-Affine-Transformation-for-Text-to-image-Synthesis.git}

下载PDF全文

下载文献需遵守相关版权规定

论文标题