Conceptiongan：通过感知理解从提供文本的现实世界图像构建

论文标题

Conceptiongan：通过感知理解从提供文本的现实世界图像构建

PerceptionGAN: Real-world Image Construction from Provided Text through Perceptual Understanding

论文作者

Garg, Kanish, Singh, Ajeet kumar, Herremans, Dorien, Lall, Brejesh

论文摘要

从提供的描述性文本中生成图像是一项艰巨的任务，因为很难合并感知信息（对象形状，颜色及其交互），并提供与提供的文本相关的高相关性。当前方法首先生成初始的低分辨率图像，该图像通常具有不规则的对象形状，颜色和对象之间的相互作用。然后，通过对文本进行调节，可以改善此初始图像。但是，这些方法主要解决了在最初生成的图像的完善中有效使用文本表示的问题，而这种改进过程的成功在很大程度上取决于最初生成的图像的质量，如DM-GAN纸中所指出的那样。因此，我们提出了一种通过在歧视器模块中纳入感知理解来提供良好初始化图像的方法。我们在第一阶段本身改善了感知信息，从而导致最终生成的图像的显着改善。在本文中，我们将方法应用于新颖的Stackgan建筑。然后，我们证明在多个阶段对图像分布进行建模时，提高了初始图像中包含的知觉信息。最后，我们生成了由文本调节的逼真的多色图像。这些图像具有良好的质量，并包含改进的基本感知信息。更重要的是，所提出的方法可以集成到其他基于文本的图像生成模型的管道中，以生成初始的低分辨率图像。我们还通过在Stackgan Architecture中增强发电机 - 分歧剂对的第三阶段来改善Stackgan的完善过程。我们的实验分析和与最新数据集中的最先进的数据集MS可可的比较进一步验证了我们提出的方法的实用性。

Generating an image from a provided descriptive text is quite a challenging task because of the difficulty in incorporating perceptual information (object shapes, colors, and their interactions) along with providing high relevancy related to the provided text. Current methods first generate an initial low-resolution image, which typically has irregular object shapes, colors, and interaction between objects. This initial image is then improved by conditioning on the text. However, these methods mainly address the problem of using text representation efficiently in the refinement of the initially generated image, while the success of this refinement process depends heavily on the quality of the initially generated image, as pointed out in the DM-GAN paper. Hence, we propose a method to provide good initialized images by incorporating perceptual understanding in the discriminator module. We improve the perceptual information at the first stage itself, which results in significant improvement in the final generated image. In this paper, we have applied our approach to the novel StackGAN architecture. We then show that the perceptual information included in the initial image is improved while modeling image distribution at multiple stages. Finally, we generated realistic multi-colored images conditioned by text. These images have good quality along with containing improved basic perceptual information. More importantly, the proposed method can be integrated into the pipeline of other state-of-the-art text-based-image-generation models to generate initial low-resolution images. We also worked on improving the refinement process in StackGAN by augmenting the third stage of the generator-discriminator pair in the StackGAN architecture. Our experimental analysis and comparison with the state-of-the-art on a large but sparse dataset MS COCO further validate the usefulness of our proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题