通过自学裁判来利用真实的谈话面孔进行健壮的伪造检测

论文标题

通过自学裁判来利用真实的谈话面孔进行健壮的伪造检测

Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection

论文作者

Haliassos, Alexandros, Mira, Rodrigo, Petridis, Stavros, Pantic, Maja

论文摘要

检测面部处理视频的最紧迫的挑战之一是在训练过程中未见的伪造方法，同时在诸如压缩等常见腐败下保持有效的效果。在本文中，我们检查是否可以通过利用真实的面孔视频来解决这个问题，这些视频包含有关自然面部外观和行为的丰富信息，并且很容易在网上大量购买。我们的方法称为理学，由两个阶段组成。首先，我们利用真实视频中的视觉和听觉方式之间的自然对应关系，以一种自学的跨模式方式学习，暂时密集的视频表示，以捕获诸如面部运动，表达和身份等因素。其次，我们将这些学识渊博的表示形式与通常的二元伪造分类任务一起被我们的伪造探测器预测。这鼓励它基于上述因素的真正/虚假决定。我们表明，我们的方法在交叉操作概括和鲁棒性实验方面实现了最先进的表现，并检查了促成其性能的因素。我们的结果表明，利用自然和未标记的视频是发展更健壮的伪造探测器的有希望的方向。

One of the most pressing challenges for the detection of face-manipulated videos is generalising to forgery methods not seen during training while remaining effective under common corruptions such as compression. In this paper, we examine whether we can tackle this issue by harnessing videos of real talking faces, which contain rich information on natural facial appearance and behaviour and are readily available in large quantities online. Our method, termed RealForensics, consists of two stages. First, we exploit the natural correspondence between the visual and auditory modalities in real videos to learn, in a self-supervised cross-modal manner, temporally dense video representations that capture factors such as facial movements, expression, and identity. Second, we use these learned representations as targets to be predicted by our forgery detector along with the usual binary forgery classification task; this encourages it to base its real/fake decision on said factors. We show that our method achieves state-of-the-art performance on cross-manipulation generalisation and robustness experiments, and examine the factors that contribute to its performance. Our results suggest that leveraging natural and unlabelled videos is a promising direction for the development of more robust face forgery detectors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题