使用WAV2VEC 2.0和数据增强的自动扬声器验证欺骗和DeepFake检测

论文标题

使用WAV2VEC 2.0和数据增强的自动扬声器验证欺骗和DeepFake检测

Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation

论文作者

Tak, Hemlata, Todisco, Massimiliano, Wang, Xin, Jung, Jee-weon, Yamagishi, Junichi, Evans, Nicholas

论文摘要

欺骗对策系统的性能从根本上取决于使用足够代表性的培训数据。由于通常有限，当前的解决方案通常缺乏野外遇到的攻击的概括。因此，需要在不受控制的，不可预测的攻击方面提高可靠性的策略。我们在本文中报告了我们以WAV2VEC 2.0前端进行微调的形式使用自我监督学习的努力。尽管仅使用善意数据和没有欺骗数据的初始基本表示，但我们获得了ASVSPOOF 2021逻辑访问和DeepFake数据库中文献中报告的最低均等错误率。当与数据增强结合使用时，这些结果对应于相对于我们的基线系统的几乎90％的改善。

The performance of spoofing countermeasure systems depends fundamentally upon the use of sufficiently representative training data. With this usually being limited, current solutions typically lack generalisation to attacks encountered in the wild. Strategies to improve reliability in the face of uncontrolled, unpredictable attacks are hence needed. We report in this paper our efforts to use self-supervised learning in the form of a wav2vec 2.0 front-end with fine tuning. Despite initial base representations being learned using only bona fide data and no spoofed data, we obtain the lowest equal error rates reported in the literature for both the ASVspoof 2021 Logical Access and Deepfake databases. When combined with data augmentation,these results correspond to an improvement of almost 90% relative to our baseline system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题