Fast-VID2VID：视频与视频合成的时空压缩

论文标题

Fast-VID2VID：视频与视频合成的时空压缩

Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis

论文作者

Zhuo, Long, Wang, Guangcong, Li, Shikai, Wu, Wayne, Liu, Ziwei

论文摘要

视频对视频合成（VID2VID）在从一系列语义图中生成照片真实视频方面取得了显着的结果。但是，该管道遭受了高计算成本和较长的推理潜伏期，这在很大程度上取决于两个基本因素：1）网络体系结构参数，2）顺序数据流。最近，基于图像的生成模型的参数已通过更有效的网络体系结构显着压缩。然而，现有方法主要集中于减肥网络体系结构，而忽略了顺序数据流的大小。而且，由于缺乏时间连贯性，基于图像的压缩不足以压缩视频任务。在本文中，我们提出了一个时空的压缩框架\ textbf {fast-vid2vid}，该框架着重于生成模型的数据方面。它首次尝试在时间维度上减少计算资源并加速推理。具体而言，我们在空间上压缩输入数据流并减少时间冗余。在提出的时空知识蒸馏之后，我们的模型可以使用低分辨率数据流合成密钥框架。最后，快速VID2VID通过运动补偿稍微潜伏期通过运动补偿来插值中间框架。在标准基准测试中，快速VID2VID围绕实时性能达到20 fps，并节省了一个V100 GPU的8倍计算成本。

Video-to-Video synthesis (Vid2Vid) has achieved remarkable results in generating a photo-realistic video from a sequence of semantic maps. However, this pipeline suffers from high computational cost and long inference latency, which largely depends on two essential factors: 1) network architecture parameters, 2) sequential data stream. Recently, the parameters of image-based generative models have been significantly compressed via more efficient network architectures. Nevertheless, existing methods mainly focus on slimming network architectures and ignore the size of the sequential data stream. Moreover, due to the lack of temporal coherence, image-based compression is not sufficient for the compression of the video task. In this paper, we present a spatial-temporal compression framework, \textbf{Fast-Vid2Vid}, which focuses on data aspects of generative models. It makes the first attempt at time dimension to reduce computational resources and accelerate inference. Specifically, we compress the input data stream spatially and reduce the temporal redundancy. After the proposed spatial-temporal knowledge distillation, our model can synthesize key-frames using the low-resolution data stream. Finally, Fast-Vid2Vid interpolates intermediate frames by motion compensation with slight latency. On standard benchmarks, Fast-Vid2Vid achieves around real-time performance as 20 FPS and saves around 8x computational cost on a single V100 GPU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题