自我监督的学习以增强语音

论文标题

自我监督的学习以增强语音

Self-supervised Learning for Speech Enhancement

论文作者

Wang, Yu-Che, Venkataramani, Shrikant, Smaragdis, Paris

论文摘要

单渠道语音增强的监督学习需要精心标记的训练示例，其中嘈杂的混合物输入了网络，并且对网络进行了训练，以产生接近理想目标的输出。为了放松培训数据的条件，我们以自我监督的方式考虑培训语音增强网络的任务。我们首先使用有限的训练一组干净的语音，并通过自动编码的幅度频谱图来学习潜在的表示。然后，我们在嘈杂环境中记录的语音混合物上自动码，并训练由此产生的自动编码器，以与干净的示例共享潜在的表示。我们表明，使用此培训模式，我们现在可以使用可以自主训练的网络将嘈杂的语音映射到其干净的版本，而无需标记培训示例或人工干预。

Supervised learning for single-channel speech enhancement requires carefully labeled training examples where the noisy mixture is input into the network and the network is trained to produce an output close to the ideal target. To relax the conditions on the training data, we consider the task of training speech enhancement networks in a self-supervised manner. We first use a limited training set of clean speech sounds and learn a latent representation by autoencoding on their magnitude spectrograms. We then autoencode on speech mixtures recorded in noisy environments and train the resulting autoencoder to share a latent representation with the clean examples. We show that using this training schema, we can now map noisy speech to its clean version using a network that is autonomously trainable without requiring labeled training examples or human intervention.

下载PDF全文

下载文献需遵守相关版权规定

论文标题