Divae：具有去素化扩散解码器合成的逼真的图像

论文标题

Divae：具有去素化扩散解码器合成的逼真的图像

DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder

论文作者

Shi, Jie, Wu, Chenfei, Liang, Jian, Liu, Xiang, Duan, Nan

论文摘要

最近，大多数成功的图像合成模型是结合不同方法的优势的多阶段过程，该过程始终包括一个类似VAE的模型，用于忠实地重构嵌入到图像中，以及以前的模型来生成图像嵌入。同时，扩散模型已显示出产生高质量合成图像的能力。我们的工作提出了一个具有扩散解码器（Divae）的VQ-VAE架构模型，以作为图像合成中的重建组件起作用。我们探索如何将嵌入到扩散模型中的图像输入以获得出色的性能，并发现对扩散的UNET的简单修改可以实现。对成像网的培训，我们的模型可实现最新的结果，并专门生成更多的逼真的图像。此外，我们将DIVAE与有条件的合成任务应用于自动退缩发电机，以执行更多的人类手感和详细的样本。

Recently most successful image synthesis models are multi stage process to combine the advantages of different methods, which always includes a VAE-like model for faithfully reconstructing embedding to image and a prior model to generate image embedding. At the same time, diffusion models have shown be capacity to generate high-quality synthetic images. Our work proposes a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis. We explore how to input image embedding into diffusion model for excellent performance and find that simple modification on diffusion's UNet can achieve it. Training on ImageNet, Our model achieves state-of-the-art results and generates more photorealistic images specifically. In addition, we apply the DiVAE with an Auto-regressive generator on conditional synthesis tasks to perform more human-feeling and detailed samples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题