学会在不计算运动的情况下压缩视频

论文标题

学会在不计算运动的情况下压缩视频

Learning to Compress Videos without Computing Motion

论文作者

Chen, Meixu, Goodall, Todd, Patney, Anjul, Bovik, Alan C.

论文摘要

随着更高分辨率内容和显示的开发，其大量卷为获取，传输，压缩和显示高质量视频内容的目标带来了重大挑战。在本文中，我们提出了一种新的深度学习视频压缩体系结构，该体系结构不需要运动估算，这是现代混合视频压缩编解码器（如H.264和HEVC）中最昂贵的元素。我们的框架利用了视频运动固有的规律性，我们通过使用流离失所的框架差异作为视频表示来训练神经网络。此外，我们提出了一个基于LSTM模型和UNET模型的新时空重建网络，我们称之为LSTM-UNET。新的视频压缩框架具有三个组件：位移计算单元（DCU），位移压缩网络（DCN）和框架重建网络（FRN）。 DCU消除了在混合编解码器中发现的运动估计的需求，并且价格便宜。在DCN中，基于RNN的网络被用于压缩位移的帧差异，并保留帧之间的时间信息。 LSTM-UNET在FRN中用于学习视频的时空差分表示。我们的实验结果表明，我们称之为一动视频编解码器（Movi-Codec）的压缩模型学会了如何在不计算运动的情况下有效地压缩视频。我们的实验表明，Movi-Codec的表现优于视频编码标准H.264的低延迟P非常快捷的设置，并超过了现代全球标准HEVC编解码器的性能，使用相同的设置，与MS-SSIM相同，尤其是在高分辨率视频上。此外，当使用MS-SSIM评估高分辨率视频时，我们的网络在更高比特率上优于最新的H.266（VVC）编解码器。

With the development of higher resolution contents and displays, its significant volume poses significant challenges to the goals of acquiring, transmitting, compressing, and displaying high-quality video content. In this paper, we propose a new deep learning video compression architecture that does not require motion estimation, which is the most expensive element of modern hybrid video compression codecs like H.264 and HEVC. Our framework exploits the regularities inherent to video motion, which we capture by using displaced frame differences as video representations to train the neural network. In addition, we propose a new space-time reconstruction network based on both an LSTM model and a UNet model, which we call LSTM-UNet. The new video compression framework has three components: a Displacement Calculation Unit (DCU), a Displacement Compression Network (DCN), and a Frame Reconstruction Network (FRN). The DCU removes the need for motion estimation found in hybrid codecs and is less expensive. In the DCN, an RNN-based network is utilized to compress displaced frame differences as well as retain temporal information between frames. The LSTM-UNet is used in the FRN to learn space-time differential representations of videos. Our experimental results show that our compression model, which we call the MOtionless VIdeo Codec (MOVI-Codec), learns how to efficiently compress videos without computing motion. Our experiments show that MOVI-Codec outperforms the Low-Delay P veryfast setting of the video coding standard H.264 and exceeds the performance of the modern global standard HEVC codec, using the same setting, as measured by MS-SSIM, especially on higher resolution videos. In addition, our network outperforms the latest H.266 (VVC) codec at higher bitrates, when assessed using MS-SSIM, on high-resolution videos.

下载PDF全文

下载文献需遵守相关版权规定

论文标题