反复逆转的反向生成对抗网络，并应用于文本指导的视频生成

论文标题

反复逆转的反向生成对抗网络，并应用于文本指导的视频生成

Recurrent Deconvolutional Generative Adversarial Networks with Application to Text Guided Video Generation

论文作者

Yu, Hongyuan, Huang, Yan, Pi, Lihong, Wang, Liang

论文摘要

本文提出了一个新颖的视频生成模型，尤其是试图从文本说明中处理视频生成问题，即综合以给定文本为条件的现实视频。由于框架不连续性问题及其无文本生成方案，因此现有的视频生成方法无法轻松地适应此任务。为了解决这些问题，我们提出了一个反复的反向逆转生成对抗网络（RD-GAN），其中包括一个复发性的反向倾斜网络（RDN）作为发电机和一个3D卷积神经网络（3D-CNN）作为歧视者。 RDN是传统复发性神经网络的反向扭转版本，可以很好地模拟生成的视频框架的远程时间依赖性，并充分利用条件信息。可以通过推动RDN生成逼真的视频来共同训练所提出的模型，以使3D-CNN无法将它们与真实的视频区分开。我们将拟议的RD-GAN应用于一系列任务，包括传统的视频生成，有条件的视频生成，视频预测和视频分类，并通过实现良好的性能来展示其有效性。

This paper proposes a novel model for video generation and especially makes the attempt to deal with the problem of video generation from text descriptions, i.e., synthesizing realistic videos conditioned on given texts. Existing video generation methods cannot be easily adapted to handle this task well, due to the frame discontinuity issue and their text-free generation schemes. To address these problems, we propose a recurrent deconvolutional generative adversarial network (RD-GAN), which includes a recurrent deconvolutional network (RDN) as the generator and a 3D convolutional neural network (3D-CNN) as the discriminator. The RDN is a deconvolutional version of conventional recurrent neural network, which can well model the long-range temporal dependency of generated video frames and make good use of conditional information. The proposed model can be jointly trained by pushing the RDN to generate realistic videos so that the 3D-CNN cannot distinguish them from real ones. We apply the proposed RD-GAN to a series of tasks including conventional video generation, conditional video generation, video prediction and video classification, and demonstrate its effectiveness by achieving well performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题