Iseebetter：使用复发性产生反射网络的时空视频超分辨率

论文标题

Iseebetter：使用复发性产生反射网络的时空视频超分辨率

iSeeBetter: Spatio-temporal video super-resolution using recurrent generative back-projection networks

论文作者

Chadha, Aman, Britto, John, Roja, M. Mani

论文摘要

最近，基于学习的模型增强了单像超分辨率（SISR）的性能。但是，将SISR依次应用于每个视频框架会导致缺乏时间连贯性。卷积神经网络（CNN）在图像质量指标（例如峰信号与噪声比（PSNR）和结构相似性（SSIM））方面优于传统方法。但是，生成的对抗网络（GAN）能够减轻缺乏更细微的纹理细节的问题，从而提供了竞争优势，通常在CNN上可以看到超级降低因素时可见的CNN。我们提出了Iseebetter，这是一种基于GAN的新型时空方法，用于视频超分辨率（VSR），它在时间上呈现时间一致的超分辨率视频。 Iseebetter使用经常性的反向预测网络作为其生成器来从当前和相邻帧中提取空间和时间信息。此外，为了改善超级分辨图像的“自然”，同时消除了使用传统算法看到的伪影，我们利用了超分辨率生成对抗网络（SRGAN）的歧视器。尽管平均平方误差（MSE）作为主要的损失最小化目标可改善PSNR/SSIM，但这些指标可能不会在图像中捕获细节，从而导致感知质量歪曲。为了解决这个问题，我们使用了四倍（MSE，感知，对抗和总变化（TV））损耗函数。我们的结果表明，Iseebetter提供了卓越的VSR保真度并超过了最先进的表现。

Recently, learning-based models have enhanced the performance of single-image super-resolution (SISR). However, applying SISR successively to each video frame leads to a lack of temporal coherency. Convolutional neural networks (CNNs) outperform traditional approaches in terms of image quality metrics such as peak signal to noise ratio (PSNR) and structural similarity (SSIM). However, generative adversarial networks (GANs) offer a competitive advantage by being able to mitigate the issue of a lack of finer texture details, usually seen with CNNs when super-resolving at large upscaling factors. We present iSeeBetter, a novel GAN-based spatio-temporal approach to video super-resolution (VSR) that renders temporally consistent super-resolution videos. iSeeBetter extracts spatial and temporal information from the current and neighboring frames using the concept of recurrent back-projection networks as its generator. Furthermore, to improve the "naturality" of the super-resolved image while eliminating artifacts seen with traditional algorithms, we utilize the discriminator from super-resolution generative adversarial network (SRGAN). Although mean squared error (MSE) as a primary loss-minimization objective improves PSNR/SSIM, these metrics may not capture fine details in the image resulting in misrepresentation of perceptual quality. To address this, we use a four-fold (MSE, perceptual, adversarial, and total-variation (TV)) loss function. Our results demonstrate that iSeeBetter offers superior VSR fidelity and surpasses state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题