论文标题

deepvox:在非理想音频信号中发现扬声器识别的原始音频中的功能

DeepVOX: Discovering Features from Raw Audio for Speaker Recognition in Non-ideal Audio Signals

论文作者

Chowdhury, Anurag, Ross, Arun

论文摘要

自动扬声器识别算法通常使用预定义的过滤库,例如MEL频率和伽马酮滤波器,以表征语音音频。但是,已经观察到,使用这些滤纸提取的特征对各种音频降解没有弹性。在这项工作中,我们提出了一种基于深度学习的技术,以从大量语音音频中推断出滤清器设计。这种过滤库的目的是提取特征在非理想的音频条件下(例如退化,持续时间短和多语言语音)的特征。为此,1D卷积神经网络旨在直接从原始语音音频中学习一个名为deepvox的时间域滤纸。其次,开发了一种自适应三重态挖掘技术,以有效地挖掘最适合训练过滤器的数据样本。第三,对DeepVox FilterBanks的详细消融研究揭示了提取特征中的声音源和声带特征的存在。 Voxceleb2,Nist SRE 2008、2010和2018和Fisher Speech Dataset的实验结果证明了DeepVox特征在各种退化,短期和多语言语音中的功效。 DeepVox的功能还显示出可提高现有说话者识别算法的性能,例如XVECTOR-PLDA和IVECTOR-PLDA。

Automatic speaker recognition algorithms typically use pre-defined filterbanks, such as Mel-Frequency and Gammatone filterbanks, for characterizing speech audio. However, it has been observed that the features extracted using these filterbanks are not resilient to diverse audio degradations. In this work, we propose a deep learning-based technique to deduce the filterbank design from vast amounts of speech audio. The purpose of such a filterbank is to extract features robust to non-ideal audio conditions, such as degraded, short duration, and multi-lingual speech. To this effect, a 1D convolutional neural network is designed to learn a time-domain filterbank called DeepVOX directly from raw speech audio. Secondly, an adaptive triplet mining technique is developed to efficiently mine the data samples best suited to train the filterbank. Thirdly, a detailed ablation study of the DeepVOX filterbanks reveals the presence of both vocal source and vocal tract characteristics in the extracted features. Experimental results on VOXCeleb2, NIST SRE 2008, 2010 and 2018, and Fisher speech datasets demonstrate the efficacy of the DeepVOX features across a variety of degraded, short duration, and multi-lingual speech. The DeepVOX features also shown to improve the performance of existing speaker recognition algorithms, such as the xVector-PLDA and the iVector-PLDA.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源