论文标题
用于广义检测深泡视频的时空特征
Spatio-temporal Features for Generalized Detection of Deepfake Videos
论文作者
论文摘要
对于DeepFake检测,视频级检测器尚未像图像级检测器那样广泛探索,而图像级检测器不会利用时间数据。在本文中,我们从经验上表明,对图像和序列分类器的现有方法概括地概括为新的操纵技术。为此,我们提出了由3D CNN建模的时空特征,以扩展概括能力以检测新的DeepFake视频。我们表明,空间特征学习独特的深层方法特异性属性,而时空特征捕获了深击方法之间的共享属性。我们对如何使用DFDC数据集ARXIV:2006.07397使用时间和时空的视频编码器如何使用时间信息进行深入分析。因此,我们揭示了我们的方法捕获了当地的时空关系和深击视频中的不一致时,而现有的序列编码器对此却无动于衷。通过在FaceForensics ++ ARXIV上进行的大规模实验:1901.08971和更深的取证Arxiv:2001.03024数据集,我们表明我们的方法在概括能力方面优于现有方法。
For deepfake detection, video-level detectors have not been explored as extensively as image-level detectors, which do not exploit temporal data. In this paper, we empirically show that existing approaches on image and sequence classifiers generalize poorly to new manipulation techniques. To this end, we propose spatio-temporal features, modeled by 3D CNNs, to extend the generalization capabilities to detect new sorts of deepfake videos. We show that spatial features learn distinct deepfake-method-specific attributes, while spatio-temporal features capture shared attributes between deepfake methods. We provide an in-depth analysis of how the sequential and spatio-temporal video encoders are utilizing temporal information using DFDC dataset arXiv:2006.07397. Thus, we unravel that our approach captures local spatio-temporal relations and inconsistencies in the deepfake videos while existing sequence encoders are indifferent to it. Through large scale experiments conducted on the FaceForensics++ arXiv:1901.08971 and Deeper Forensics arXiv:2001.03024 datasets, we show that our approach outperforms existing methods in terms of generalization capabilities.