可学习的非线性压缩，用于可靠的扬声器验证

论文标题

可学习的非线性压缩，用于可靠的扬声器验证

Learnable Nonlinear Compression for Robust Speaker Verification

论文作者

Liu, Xuechen, Sahidullah, Md, Kinnunen, Tomi

论文摘要

在这项研究中，我们专注于基于深神经网络的扬声器验证的光谱特征中的非线性压缩方法。我们考虑以数据驱动方式优化的不同类型的通道依赖性（CD）非线性压缩方法。我们的方法基于功率非线性和动态范围压缩（DRC）。我们还提出了关于非线性的多主权（MR）设计，以提高鲁棒性。 Voxceleb1和VoxMovies数据的结果表明，通过提议的压缩方法对常用对数及其静态对应物进行了改进，尤其是基于功率功能的对数。虽然CD的概括提高了Voxceleb1的性能，但MR在VoxMovies上提供了更大的鲁棒性，最大相对误差率降低21.6％。

In this study, we focus on nonlinear compression methods in spectral features for speaker verification based on deep neural network. We consider different kinds of channel-dependent (CD) nonlinear compression methods optimized in a data-driven manner. Our methods are based on power nonlinearities and dynamic range compression (DRC). We also propose multi-regime (MR) design on the nonlinearities, at improving robustness. Results on VoxCeleb1 and VoxMovies data demonstrate improvements brought by proposed compression methods over both the commonly-used logarithm and their static counterparts, especially for ones based on power function. While CD generalization improves performance on VoxCeleb1, MR provides more robustness on VoxMovies, with a maximum relative equal error rate reduction of 21.6%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题