论文标题
语音增强的子频段知识蒸馏框架
Sub-Band Knowledge Distillation Framework for Speech Enhancement
论文作者
论文摘要
在单渠道语音增强中,已经广泛研究了基于全频谱特征的方法。但是,只有少数方法注意非充满频段的光谱特征。在本文中,我们探讨了一个基于单频段映射的知识蒸馏框架,以增强单渠道语音。具体而言,我们将全频段分为多个子频段,并为每个子频段预先训练精英级别的子频段增强模型(教师模型)。这些教师模型致力于处理自己的子兰。接下来,在教师模型的指导下,我们培训适用于所有子兰的一般子带增强模型(学生模型)。不增加模型参数和计算复杂性的数量,学生模型的表现将进一步提高。为了评估我们提出的方法,我们在开源数据集上进行了大量实验。最终的实验结果表明,精英级教师模型的指导极大地改善了学生模型的表现,该表现通过使用较少的参数来超过全乐队模型。
In single-channel speech enhancement, methods based on full-band spectral features have been widely studied. However, only a few methods pay attention to non-full-band spectral features. In this paper, we explore a knowledge distillation framework based on sub-band spectral mapping for single-channel speech enhancement. Specifically, we divide the full frequency band into multiple sub-bands and pre-train an elite-level sub-band enhancement model (teacher model) for each sub-band. These teacher models are dedicated to processing their own sub-bands. Next, under the teacher models' guidance, we train a general sub-band enhancement model (student model) that works for all sub-bands. Without increasing the number of model parameters and computational complexity, the student model's performance is further improved. To evaluate our proposed method, we conducted a large number of experiments on an open-source data set. The final experimental results show that the guidance from the elite-level teacher models dramatically improves the student model's performance, which exceeds the full-band model by employing fewer parameters.