神经卡尔曼过滤以增强语音

论文标题

神经卡尔曼过滤以增强语音

Neural Kalman Filtering for Speech Enhancement

论文作者

Xue, Wei, Quan, Gang, Zhang, Chao, Ding, Guohong, He, Xiaodong, Zhou, Bowen

论文摘要

基于统计信号处理的语音增强方法采用专家知识来设计统计模型和线性过滤器，这与基于数据驱动的基于深神经网络（DNN）方法互补。在本文中，通过使用统计信号处理的专家知识进行网络设计和优化，我们将常规的卡尔曼过滤（KF）扩展到监督学习方案，并提出神经卡尔曼过滤（NKF）以增强语音。首先由复发性神经网络（RNN）和线性维也纳滤波（WF）分别产生两个中间干净的语音估计，然后通过学习的NKF增益线性合并以产生NKF输出。监督联合培训适用于NKF，以学会在WF进行的瞬时线性估计与RNN进行的长期非线性估计之间自动折衷。 NKF方法可以看作是使用WF的专家知识来正规化RNN估计，以提高其对训练中看不见的噪声条件的概括能力。在不同嘈杂条件下的实验表明，在客观评估指标和自动语音识别（ASR）单词错误率（WERS）方面，所提出的方法的表现优于基线方法。

Statistical signal processing based speech enhancement methods adopt expert knowledge to design the statistical models and linear filters, which is complementary to the deep neural network (DNN) based methods which are data-driven. In this paper, by using expert knowledge from statistical signal processing for network design and optimization, we extend the conventional Kalman filtering (KF) to the supervised learning scheme, and propose the neural Kalman filtering (NKF) for speech enhancement. Two intermediate clean speech estimates are first produced from recurrent neural networks (RNN) and linear Wiener filtering (WF) separately and are then linearly combined by a learned NKF gain to yield the NKF output. Supervised joint training is applied to NKF to learn to automatically trade-off between the instantaneous linear estimation made by the WF and the long-term non-linear estimation made by the RNN. The NKF method can be seen as using expert knowledge from WF to regularize the RNN estimations to improve its generalization ability to the noise conditions unseen in the training. Experiments in different noisy conditions show that the proposed method outperforms the baseline methods both in terms of objective evaluation metrics and automatic speech recognition (ASR) word error rates (WERs).

下载PDF全文

下载文献需遵守相关版权规定

论文标题