Alanet：自适应潜在注意力网络Forchoint视频DEBLURING和插值

论文标题

Alanet：自适应潜在注意力网络Forchoint视频DEBLURING和插值

ALANET: Adaptive Latent Attention Network forJoint Video Deblurring and Interpolation

论文作者

Gupta, Akash, Aich, Abhishek, Roy-Chowdhury, Amit K.

论文摘要

现有作品通过单独学习框架去缩合和框架插值模块来解决产生高帧速率尖锐视频的问题。这些方法中的大多数都有很强的事先假设，即所有输入帧都是模糊的，而在现实世界中，帧的质量会有所不同。此外，对这种方法进行了培训，可以孤立地执行两项任务之一 - 脱毛或插值，而许多实际情况都需要两者。与这些作品不同，我们解决了一个更现实的高速率尖锐视频综合问题，而没有事先假设输入总是模糊的。我们介绍了一种新颖的体系结构，自适应潜在的注意网络（ALANET），该网络综合了尖锐的高框架速率视频，而没有任何模糊的输入帧的先验知识，从而执行了脱毛和插值的任务。我们假设可以利用来自连续帧的潜在表示的信息来生成框架脱毛和框架插值的优化表示。具体而言，我们在潜在空间中连续帧之间采用自我注意事项和交叉意见模块的组合，以生成每个帧的优化表示形式。使用这些注意模块学到的优化表示形式有助于模型生成和插值锋利的框架。在标准数据集上进行的广泛实验表明，即使我们解决了一个更困难的问题，我们的方法对各种最新方法都表现出色。

Existing works address the problem of generating high frame-rate sharp videos by separately learning the frame deblurring and frame interpolation modules. Most of these approaches have a strong prior assumption that all the input frames are blurry whereas in a real-world setting, the quality of frames varies. Moreover, such approaches are trained to perform either of the two tasks - deblurring or interpolation - in isolation, while many practical situations call for both. Different from these works, we address a more realistic problem of high frame-rate sharp video synthesis with no prior assumption that input is always blurry. We introduce a novel architecture, Adaptive Latent Attention Network (ALANET), which synthesizes sharp high frame-rate videos with no prior knowledge of input frames being blurry or not, thereby performing the task of both deblurring and interpolation. We hypothesize that information from the latent representation of the consecutive frames can be utilized to generate optimized representations for both frame deblurring and frame interpolation. Specifically, we employ combination of self-attention and cross-attention module between consecutive frames in the latent space to generate optimized representation for each frame. The optimized representation learnt using these attention modules help the model to generate and interpolate sharp frames. Extensive experiments on standard datasets demonstrate that our method performs favorably against various state-of-the-art approaches, even though we tackle a much more difficult problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题