关于基于GAN的语音增强系统的损失功能和复发培训

论文标题

关于基于GAN的语音增强系统的损失功能和复发培训

On Loss Functions and Recurrency Training for GAN-based Speech Enhancement Systems

论文作者

Zhang, Zhuohuang, Deng, Chengyun, Shen, Yi, Williamson, Donald S., Sha, Yongtao, Zhang, Yi, Song, Hui, Li, Xiangang

论文摘要

最近的工作表明，使用生成的对抗网络（GAN）进行语音增强是可行的，但是，这些方法尚未与最新的（SOTA）非基于GAN的方法进行比较。此外，已经提出了许多用于基于GAN的方法的损失功能，但尚未得到充分比较。在这项研究中，我们提出了新颖的卷积复发gan（Crgan）架构以增强语音。采用多个损失功能来与其他基于GAN的系统进行直接比较。还探讨了包括复发层的好处。我们的结果表明，拟议的Crgan模型使用相同的损失功能优于基于SOTA的模型，并且它的表现优于其他基于非GAN的系统，这表明使用GAN进行语音增强的好处。总体而言，将客观度量损失函数与平方误差（MSE）相结合的CRGAN模型提供了比许多评估指标的比较方法的最佳性能。

Recent work has shown that it is feasible to use generative adversarial networks (GANs) for speech enhancement, however, these approaches have not been compared to state-of-the-art (SOTA) non GAN-based approaches. Additionally, many loss functions have been proposed for GAN-based approaches, but they have not been adequately compared. In this study, we propose novel convolutional recurrent GAN (CRGAN) architectures for speech enhancement. Multiple loss functions are adopted to enable direct comparisons to other GAN-based systems. The benefits of including recurrent layers are also explored. Our results show that the proposed CRGAN model outperforms the SOTA GAN-based models using the same loss functions and it outperforms other non-GAN based systems, indicating the benefits of using a GAN for speech enhancement. Overall, the CRGAN model that combines an objective metric loss function with the mean squared error (MSE) provides the best performance over comparison approaches across many evaluation metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题