论文标题
DeepFake检测挑战(DFDC)数据集
The DeepFake Detection Challenge (DFDC) Dataset
论文作者
论文摘要
DeepFakes是一种最近的现成操纵技术,可让任何人在单个视频中交换两个身份。除了深击外,还使用随附的代码发布了各种基于GAN的面部交换方法。为了应对这一新兴威胁,我们构建了一个非常大的面部交换视频数据集,以实现检测模型的培训,并组织了伴随的深层检测挑战(DFDC)Kaggle竞争。重要的是,所有记录的受试者都同意参与并在构建面部汇合数据集期间修改其相似之处。 DFDC数据集是迄今为止最大的,公开可用的面部掉期视频数据集,总剪辑超过100,000个来自3,426个付费演员,这些剪辑由几种Deepfake,基于GAN,基于GAN的方法和非学习方法生产。除了描述用于构建数据集的方法外,我们还提供了Kaggle竞赛中最高提交的详细分析。我们表明,尽管DeepFake检测非常困难,但仍然是一个尚未解决的问题,但仅在DFDC上训练的DeepFake检测模型才能推广到真正的“野外”深击视频,并且在分析潜在的深层视频时,这种模型可以成为有价值的分析工具。可以从https://ai.facebook.com/datasets/dfdc下载培训,验证和测试库。
Deepfakes are a recent off-the-shelf manipulation technique that allows anyone to swap two identities in a single video. In addition to Deepfakes, a variety of GAN-based face swapping methods have also been published with accompanying code. To counter this emerging threat, we have constructed an extremely large face swap video dataset to enable the training of detection models, and organized the accompanying DeepFake Detection Challenge (DFDC) Kaggle competition. Importantly, all recorded subjects agreed to participate in and have their likenesses modified during the construction of the face-swapped dataset. The DFDC dataset is by far the largest currently and publicly available face swap video dataset, with over 100,000 total clips sourced from 3,426 paid actors, produced with several Deepfake, GAN-based, and non-learned methods. In addition to describing the methods used to construct the dataset, we provide a detailed analysis of the top submissions from the Kaggle contest. We show although Deepfake detection is extremely difficult and still an unsolved problem, a Deepfake detection model trained only on the DFDC can generalize to real "in-the-wild" Deepfake videos, and such a model can be a valuable analysis tool when analyzing potentially Deepfaked videos. Training, validation and testing corpuses can be downloaded from https://ai.facebook.com/datasets/dfdc.