论文标题
深度神经网络技术增强语音:最新分析状态
Deep neural network techniques for monaural speech enhancement: state of the art analysis
论文作者
论文摘要
深度神经网络(DNN)技术已在自然语言处理和计算机视觉等领域中普遍存在。他们在这些任务中取得了巨大的成功,例如机器翻译和图像生成。由于它们的成功,这些数据驱动的技术已应用于音频域。更具体地说,DNN模型已在语音增强域中应用,以实现单声道语音增强中的表示,脊椎和多扬声器分离。在本文中,我们回顾了一些主要的DNN技术来实现语音分离。该评论着眼于功能提取的整个语音增强渠道,基于DNN的工具如何建模语音和模型培训的全球和本地特征(受监督和无监督)。我们还回顾了使用语音增强预培训模型来增强语音增强过程的使用。该评论旨在涵盖DNN在通过单个扬声器获得的语音中的语音增强中应用的主导趋势。
Deep neural networks (DNN) techniques have become pervasive in domains such as natural language processing and computer vision. They have achieved great success in these domains in task such as machine translation and image generation. Due to their success, these data driven techniques have been applied in audio domain. More specifically, DNN models have been applied in speech enhancement domain to achieve denosing, dereverberation and multi-speaker separation in monaural speech enhancement. In this paper, we review some dominant DNN techniques being employed to achieve speech separation. The review looks at the whole pipeline of speech enhancement from feature extraction, how DNN based tools are modelling both global and local features of speech and model training (supervised and unsupervised). We also review the use of speech-enhancement pre-trained models to boost speech enhancement process. The review is geared towards covering the dominant trends with regards to DNN application in speech enhancement in speech obtained via a single speaker.