文本对图像扩散的多概念自定义

论文标题

文本对图像扩散的多概念自定义

Multi-Concept Customization of Text-to-Image Diffusion

论文作者

Kumari, Nupur, Zhang, Bingliang, Zhang, Richard, Shechtman, Eli, Zhu, Jun-Yan

论文摘要

尽管生成模型产生了从大规模数据库中学到的概念的高质量图像，但用户经常希望综合其自己概念的实例化（例如，他们的家庭，宠物或项目）。我们可以教一个模型以快速获取新概念，并给出一些例子吗？此外，我们可以将多个新概念构成吗？我们提出了自定义扩散，这是一种增强现有文本对图像模型的有效方法。我们发现，仅在文本到图像条件机制中优化一些参数足以代表新概念，同时可以快速调整（〜6分钟）。此外，我们可以共同训练多个概念，或通过封闭形式的约束优化将多个微调模型组合到一个模型中。我们的微调模型产生了多个新概念的变化，并无缝地将它们与新颖设置中的现有概念组成。我们的方法的表现优于或与几个基线和同时发生的定性和定量评估相同，同时是记忆和计算有效的。

While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their own concepts (for example, their family, pets, or items). Can we teach a model to quickly acquire a new concept, given a few examples? Furthermore, can we compose multiple new concepts together? We propose Custom Diffusion, an efficient method for augmenting existing text-to-image models. We find that only optimizing a few parameters in the text-to-image conditioning mechanism is sufficiently powerful to represent new concepts while enabling fast tuning (~6 minutes). Additionally, we can jointly train for multiple concepts or combine multiple fine-tuned models into one via closed-form constrained optimization. Our fine-tuned model generates variations of multiple new concepts and seamlessly composes them with existing concepts in novel settings. Our method outperforms or performs on par with several baselines and concurrent works in both qualitative and quantitative evaluations while being memory and computationally efficient.

下载PDF全文

下载文献需遵守相关版权规定

论文标题