SkipConvnet：使用最佳平滑光谱映射的Skip卷积神经网络用于语音覆盖

论文标题

SkipConvnet：使用最佳平滑光谱映射的Skip卷积神经网络用于语音覆盖

SkipConvNet: Skip Convolutional Neural Network for Speech Dereverberation using Optimally Smoothed Spectral Mapping

论文作者

Kothapally, Vinay, Xia, Wei, Ghorbani, Shahram, Hansen, John H. L., Xue, Wei, Huang, Jing

论文摘要

在许多语音应用中，最近的研究成功证明了使用完全卷积网络（FCN）的可靠性。这些FCN最受欢迎的变体之一是“ U-NET”，它是带有跳过连接的编码器网络。在这项研究中，我们提出了“ skipconvnet”，在其中，我们用多个卷积模块代替每个跳过连接，以向解码器提供直观的特征图，而不是编码器的输出以提高网络的学习能力。我们还建议将功率光谱密度（PSD）的最佳平滑性用作预处理步骤，这有助于进一步提高网络的效率。为了评估我们提出的系统，我们使用Reverb挑战语料库来评估相同条件下各种增强方法的性能。我们仅着重于监视语音质量的改进及其对提高后端语音系统效率的贡献，例如语音识别和扬声器验证，仅接受了干净的语音培训。实验发现表明，提出的系统始终优于其他方法。

The reliability of using fully convolutional networks (FCNs) has been successfully demonstrated by recent studies in many speech applications. One of the most popular variants of these FCNs is the `U-Net', which is an encoder-decoder network with skip connections. In this study, we propose `SkipConvNet' where we replace each skip connection with multiple convolutional modules to provide decoder with intuitive feature maps rather than encoder's output to improve the learning capacity of the network. We also propose the use of optimal smoothing of power spectral density (PSD) as a pre-processing step, which helps to further enhance the efficiency of the network. To evaluate our proposed system, we use the REVERB challenge corpus to assess the performance of various enhancement approaches under the same conditions. We focus solely on monitoring improvements in speech quality and their contribution to improving the efficiency of back-end speech systems, such as speech recognition and speaker verification, trained on only clean speech. Experimental findings show that the proposed system consistently outperforms other approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题