完全自动化的端到端假音频检测

论文标题

完全自动化的端到端假音频检测

Fully Automated End-to-End Fake Audio Detection

论文作者

Wang, Chenglong, Yi, Jiangyan, Tao, Jianhua, Sun, Haiyang, Chen, Xun, Tian, Zhengkun, Ma, Haoxin, Fan, Cunhang, Fu, Ruibo

论文摘要

现有的假音频检测系统通常依靠专家经验来设计声学功能或手动设计网络结构的超参数。但是，人工调整参数可能会对结果产生相对明显的影响。几乎不可能手动设置最佳参数集。因此，本文提出了一种完全自动化的终端伪造音频检测方法。我们首先使用WAV2VEC预训练模型来获得语音的高级表示。此外，对于网络结构，我们使用了名为Light-Darts的可区分体系结构搜索（飞镖）的修改版本。它学习了深厚的语音表示，同时自动学习和优化包括卷积操作和残留块组成的复杂神经结构。 ASVSPOOF 2019 LA数据集的实验结果表明，我们提出的系统达到的错误率（EER）为1.08％，这表现优于最先进的单个系统。

The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure. However, artificial adjustment of the parameters can have a relatively obvious influence on the results. It is almost impossible to manually set the best set of parameters. Therefore this paper proposes a fully automated end-toend fake audio detection method. We first use wav2vec pre-trained model to obtain a high-level representation of the speech. Furthermore, for the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS. It learns deep speech representations while automatically learning and optimizing complex neural structures consisting of convolutional operations and residual blocks. The experimental results on the ASVspoof 2019 LA dataset show that our proposed system achieves an equal error rate (EER) of 1.08%, which outperforms the state-of-the-art single system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题