论文标题
使用基于口罩的后过滤器来增强编码语音
Enhancement Of Coded Speech Using a Mask-Based Post-Filter
论文作者
论文摘要
由于高量化噪声,语音编解码器的质量在低比特率下恶化。通常使用后过滤器来提高编码语音的质量。在本文中,提出了依赖于时频域中掩盖的数据驱动后过滤器。插入了完全连接的神经网络(FCNN),卷积编码器(CED)网络和较长的短期内存(LSTM)网络,以估算每个时间频箱的真实评估掩码。在自适应多速率宽带编解码器(AMR-WB)的五个最低操作模式(6.65 kbps-15.85 kbps)上测试了所提出的模型。客观和主观评估都证实了编码语音的增强,还显示了基于掩模的神经网络系统比标准中使用的常规启发式后过滤器(如ITU-T G.718)的优越性。
The quality of speech codecs deteriorates at low bitrates due to high quantization noise. A post-filter is generally employed to enhance the quality of the coded speech. In this paper, a data-driven post-filter relying on masking in the time-frequency domain is proposed. A fully connected neural network (FCNN), a convolutional encoder-decoder (CED) network and a long short-term memory (LSTM) network are implemeted to estimate a real-valued mask per time-frequency bin. The proposed models were tested on the five lowest operating modes (6.65 kbps-15.85 kbps) of the Adaptive Multi-Rate Wideband codec (AMR-WB). Both objective and subjective evaluations confirm the enhancement of the coded speech and also show the superiority of the mask-based neural network system over a conventional heuristic post-filter used in the standard like ITU-T G.718.