使用模式寻求功能将文本改进到图像生成

论文标题

使用模式寻求功能将文本改进到图像生成

Improving Text to Image Generation using Mode-seeking Function

论文作者

Bhise, Naitik, Zhang, Zhenfei, Bui, Tien D.

论文摘要

生成的对抗网络（GAN）长期以来一直用于了解文本和图像之间的语义关系。但是，图像生成中崩溃的模式存在问题，导致一些首选的输出模式。我们的目的是通过使用专门的寻求模式损失功能来避免此问题来改善网络的培训。在图像合成的文本中，我们的损失函数在潜在空间中区分了两个点，以生成不同的图像。我们通过在训练过程中改变损失函数的强度来验证Caltech Birds（CUB）数据集和Microsoft可可数据集的模型。实验结果表明，与某些最新方法相比，我们的模型效果很好。

Generative Adversarial Networks (GANs) have long been used to understand the semantic relationship between the text and image. However, there are problems with mode collapsing in the image generation that causes some preferred output modes. Our aim is to improve the training of the network by using a specialized mode-seeking loss function to avoid this issue. In the text to image synthesis, our loss function differentiates two points in latent space for the generation of distinct images. We validate our model on the Caltech Birds (CUB) dataset and the Microsoft COCO dataset by changing the intensity of the loss function during the training. Experimental results demonstrate that our model works very well compared to some state-of-the-art approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题