理解扩散模型：统一的观点

论文标题

理解扩散模型：统一的观点

Understanding Diffusion Models: A Unified Perspective

论文作者

Luo, Calvin

论文摘要

扩散模型显示出令人难以置信的能力作为生成模型。确实，它们为文本条件图像生成（例如Imagen和dall-e 2）的当前最新模型提供了为当前的最新模型提供动力。在这项工作中，我们回顾，揭开了基于变异和基于得分的观点的扩散模型的理解。我们首先将变化扩散模型（VDM）作为马尔可夫分层变异自动编码器的特殊情况，其中三个关键假设实现了ELBO的可拖动计算和可扩展的优化。然后，我们证明，优化VDM可以归结为学习神经网络以预测三个潜在目标之一：来自任何任意噪声的原始源输入，任何任意噪声输入的原始源噪声或在任何任意噪声水平下噪声输入的得分函数。然后，我们更深入地研究学习分数函数的含义，并将扩散模型的变异透视图与通过Tweedie的公式明确地与基于得分的生成建模的观点联系起来。最后，我们介绍了如何通过指导使用扩散模型来学习条件分布。

Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we review, demystify, and unify the understanding of diffusion models across both variational and score-based perspectives. We first derive Variational Diffusion Models (VDM) as a special case of a Markovian Hierarchical Variational Autoencoder, where three key assumptions enable tractable computation and scalable optimization of the ELBO. We then prove that optimizing a VDM boils down to learning a neural network to predict one of three potential objectives: the original source input from any arbitrary noisification of it, the original source noise from any arbitrarily noisified input, or the score function of a noisified input at any arbitrary noise level. We then dive deeper into what it means to learn the score function, and connect the variational perspective of a diffusion model explicitly with the Score-based Generative Modeling perspective through Tweedie's Formula. Lastly, we cover how to learn a conditional distribution using diffusion models via guidance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题