非常低位比率的面部视频的深度多模式软编码

论文标题

非常低位比率的面部视频的深度多模式软编码

Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos

论文作者

Guo, Yanhui, Zhang, Xi, Wu, Xiaolin

论文摘要

我们提出了一个新颖的深度多模式神经网络，用于恢复说话头的非常低的比特率视频。此类视频内容在社交媒体，电信，远程教育，远程医疗等中非常普遍，并且经常需要以有限的带宽来传播。提出的CNN方法利用了扬声器的三种方式，视频，音频和情感状态之间的相关性，以删除由空间下调和量化引起的视频压缩工件。深度学习方法原来非常适合视频恢复任务，因为复杂的非线性跨模式相关性很难在分析和明确的模型上进行建模。新方法是一种视频后处理器，可以显着提高积极压缩的会话视频的感知质量，同时与所有现有的视频压缩标准完全兼容。

We propose a novel deep multi-modality neural network for restoring very low bit rate videos of talking heads. Such video contents are very common in social media, teleconferencing, distance education, tele-medicine, etc., and often need to be transmitted with limited bandwidth. The proposed CNN method exploits the correlations among three modalities, video, audio and emotion state of the speaker, to remove the video compression artifacts caused by spatial down sampling and quantization. The deep learning approach turns out to be ideally suited for the video restoration task, as the complex non-linear cross-modality correlations are very difficult to model analytically and explicitly. The new method is a video post processor that can significantly boost the perceptual quality of aggressively compressed talking head videos, while being fully compatible with all existing video compression standards.

下载PDF全文

下载文献需遵守相关版权规定

论文标题