Make-A-Video：没有文本视频数据的文本到视频生成

论文标题

Make-A-Video：没有文本视频数据的文本到视频生成

Make-A-Video: Text-to-Video Generation without Text-Video Data

论文作者

Singer, Uriel, Polyak, Adam, Hayes, Thomas, Yin, Xi, An, Jie, Zhang, Songyang, Hu, Qiyuan, Yang, Harry, Ashual, Oron, Gafni, Oran, Parikh, Devi, Gupta, Sonal, Taigman, Yaniv

论文摘要

我们提出了Make-A-Video-一种直接将文本形象（T2I）生成最新进展的方法直接转化为文本对视频（T2V）的方法。我们的直觉很简单：了解世界的外观以及如何从配对的文本图像数据中描述它，并了解世界如何从无监督的录像中移动。 Make-A-Video具有三个优点：（1）它加速了T2V模型的培训（它不需要从头开始学习视觉和多模式表示），（2）它不需要成对的文本视频数据，（3）生成的视频继承了广阔的广泛（幻想，梦幻般的幻想，奇妙的绘制摄入量，等等）模型的模型。我们设计了一种简单而有效的方法，可以在具有新颖有效的时空模块的T2I模型上构建T2I模型。首先，我们分解了完整的时间U-NET和注意力张量，并在空间和时间上近似它们。其次，我们设计了一个空间时间管道，以使用视频解码器，插值模型和两个超级分辨率模型生成高分辨率和帧率视频，这些模型可以启用除T2V以外的各种应用程序。在各个方面，空间和时间分辨率，对文本的忠诚以及质量，Make-A-Video都设置了文本到视频生成的新最新，这是由定性和定量措施确定的。

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple: learn what the world looks like and how it is described from paired text-image data, and learn how the world moves from unsupervised video footage. Make-A-Video has three advantages: (1) it accelerates training of the T2V model (it does not need to learn visual and multimodal representations from scratch), (2) it does not require paired text-video data, and (3) the generated videos inherit the vastness (diversity in aesthetic, fantastical depictions, etc.) of today's image generation models. We design a simple yet effective way to build on T2I models with novel and effective spatial-temporal modules. First, we decompose the full temporal U-Net and attention tensors and approximate them in space and time. Second, we design a spatial temporal pipeline to generate high resolution and frame rate videos with a video decoder, interpolation model and two super resolution models that can enable various applications besides T2V. In all aspects, spatial and temporal resolution, faithfulness to text, and quality, Make-A-Video sets the new state-of-the-art in text-to-video generation, as determined by both qualitative and quantitative measures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题