通过平衡局灶性损失进行重播攻击检测，动态缓解数据差异

论文标题

通过平衡局灶性损失进行重播攻击检测，动态缓解数据差异

Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection

论文作者

Dou, Yongqiang, Yang, Haocheng, Yang, Maolin, Xu, Yanyan, Ke, Dengfeng

论文摘要

由于高质量的播放设备的发展，设计有效的自动扬声器验证系统的有效反动体组织急需。当前的研究主要将反欺骗性视为善意和欺骗话语之间的二元分类问题，而缺乏无法区分的样本使得很难训练强大的欺骗检测器。在本文中，我们认为，对于反欺骗，在建模过程中，与易于分类的样本相对于易于分类的样本需要更多的关注，以使正确的歧视成为当务之急。因此，为了减轻训练和推理之间的数据差异，我们提出D3M，以利用平衡的局灶性损失函数，作为训练目标，以根据样本本身的特征动态扩展损失。此外，在实验中，我们选择了三种功能，这些功能既包含基于大小的信息，又可以形成互补和信息丰富的特征。 ASVSPOOF2019数据集的实验结果通过比较我们的系统和表现最佳的方法证明了所提出的方法的优越性。接受平衡局灶性损失训练的系统的性能明显优于常规跨透明拷贝损失。凭借互补的功能，我们的融合系统只有三种功能优于其他系统，其中包括五个或更复杂的单个型号的其他系统，对于Min-TDCF，EER的融合系统分别为22.5％，分别达到Min-TDCF，EER分别为0.0124和0.55％。此外，除了模拟的ASVSPOOF2019数据外，我们还介绍并讨论了对实际重播数据的评估结果，表明抗疾病的研究仍然还有很长的路要走。源代码，分析数据和其他详细信息可在https://github.com/asvspooof/d3m上公开获取。

It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M.

下载PDF全文

下载文献需遵守相关版权规定

论文标题