带有和不平行语音数据的语音增强语音反馈

论文标题

带有和不平行语音数据的语音增强语音反馈

Phonetic Feedback for Speech Enhancement With and Without Parallel Speech Data

论文作者

Plantinga, Peter, Bagchi, Deblin, Fosler-Lussier, Eric

论文摘要

尽管深度学习系统在言语增强研究方面取得了重要成就，但这些系统尚未利用深度学习系统的全部潜力来提供高级反馈。特别是，语音反馈在语音增强研究中很少见，即使它包含了宝贵的自上而下信息。我们使用模仿损失的技术为现成的增强系统提供语音反馈，并在Chime-4数据上找到目标可理解性得分的提高。即使在没有平行语音数据的情况下，该技术采用了在干净的语音上训练的冷冻声学模型，以提供对增强模型的有价值的反馈。我们的工作是最早在没有平行语音数据的情况下显示出神经增强系统的清晰度提高的工作之一，我们显示语音反馈可以改善经过平行语音数据训练的最新神经增强系统。

While deep learning systems have gained significant ground in speech enhancement research, these systems have yet to make use of the full potential of deep learning systems to provide high-level feedback. In particular, phonetic feedback is rare in speech enhancement research even though it includes valuable top-down information. We use the technique of mimic loss to provide phonetic feedback to an off-the-shelf enhancement system, and find gains in objective intelligibility scores on CHiME-4 data. This technique takes a frozen acoustic model trained on clean speech to provide valuable feedback to the enhancement model, even in the case where no parallel speech data is available. Our work is one of the first to show intelligibility improvement for neural enhancement systems without parallel speech data, and we show phonetic feedback can improve a state-of-the-art neural enhancement system trained with parallel speech data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题