E-VFIA：基于事件的视频框架插值引起注意

论文标题

E-VFIA：基于事件的视频框架插值引起注意

E-VFIA : Event-Based Video Frame Interpolation with Attention

论文作者

Kılıç, Onur Selim, Akman, Ahmet, Alatan, A. Aydın

论文摘要

视频框架插值（VFI）是一项基本的视觉任务，旨在综合两个连续的原始视频图像之间的几个帧。大多数算法旨在通过仅使用密钥帧来完成VFI，这是一个错误的问题，因为密钥帧通常不会对场景中对象的轨迹产生任何准确的精度。另一方面，基于事件的摄像机在视频的关键帧之间提供了更精确的信息。一些最新的基于事件的最新方法通过利用事件数据来更好地通过扭曲来插值视频框架来解决此问题。尽管如此，这些方法严重遭受了重影效果。另一方面，仅使用帧作为输入的一些基于内核的VFI方法，表明可变形的卷积在用变压器备份时可以是处理长期依赖关系的可靠方法。我们提出了基于事件的视频框架插值（E-VFIA），作为一种基于轻质核的方法。 E-VFIA通过可变形的卷积将事件信息与标准视频帧融合在一起，以生成高质量的插值框架。所提出的方法表示具有高时间分辨率的事件，并使用多头的自发机制来更好地编码基于事件的信息，同时不太容易受到模糊和鬼影的影响。因此，产生更清晰的框架。仿真结果表明，该提出的技术的表现优于当前最新方法（框架和基于事件的方法），其模型尺寸明显较小。

Video frame interpolation (VFI) is a fundamental vision task that aims to synthesize several frames between two consecutive original video images. Most algorithms aim to accomplish VFI by using only keyframes, which is an ill-posed problem since the keyframes usually do not yield any accurate precision about the trajectories of the objects in the scene. On the other hand, event-based cameras provide more precise information between the keyframes of a video. Some recent state-of-the-art event-based methods approach this problem by utilizing event data for better optical flow estimation to interpolate for video frame by warping. Nonetheless, those methods heavily suffer from the ghosting effect. On the other hand, some of kernel-based VFI methods that only use frames as input, have shown that deformable convolutions, when backed up with transformers, can be a reliable way of dealing with long-range dependencies. We propose event-based video frame interpolation with attention (E-VFIA), as a lightweight kernel-based method. E-VFIA fuses event information with standard video frames by deformable convolutions to generate high quality interpolated frames. The proposed method represents events with high temporal resolution and uses a multi-head self-attention mechanism to better encode event-based information, while being less vulnerable to blurring and ghosting artifacts; thus, generating crispier frames. The simulation results show that the proposed technique outperforms current state-of-the-art methods (both frame and event-based) with a significantly smaller model size.

下载PDF全文

下载文献需遵守相关版权规定

论文标题