语音增强的子频段知识蒸馏框架

论文标题

语音增强的子频段知识蒸馏框架

Sub-Band Knowledge Distillation Framework for Speech Enhancement

论文作者

Hao, Xiang, Wen, Shixue, Su, Xiangdong, Liu, Yun, Gao, Guanglai, Li, Xiaofei

论文摘要

在单渠道语音增强中，已经广泛研究了基于全频谱特征的方法。但是，只有少数方法注意非充满频段的光谱特征。在本文中，我们探讨了一个基于单频段映射的知识蒸馏框架，以增强单渠道语音。具体而言，我们将全频段分为多个子频段，并为每个子频段预先训练精英级别的子频段增强模型（教师模型）。这些教师模型致力于处理自己的子兰。接下来，在教师模型的指导下，我们培训适用于所有子兰的一般子带增强模型（学生模型）。不增加模型参数和计算复杂性的数量，学生模型的表现将进一步提高。为了评估我们提出的方法，我们在开源数据集上进行了大量实验。最终的实验结果表明，精英级教师模型的指导极大地改善了学生模型的表现，该表现通过使用较少的参数来超过全乐队模型。

In single-channel speech enhancement, methods based on full-band spectral features have been widely studied. However, only a few methods pay attention to non-full-band spectral features. In this paper, we explore a knowledge distillation framework based on sub-band spectral mapping for single-channel speech enhancement. Specifically, we divide the full frequency band into multiple sub-bands and pre-train an elite-level sub-band enhancement model (teacher model) for each sub-band. These teacher models are dedicated to processing their own sub-bands. Next, under the teacher models' guidance, we train a general sub-band enhancement model (student model) that works for all sub-bands. Without increasing the number of model parameters and computational complexity, the student model's performance is further improved. To evaluate our proposed method, we conducted a large number of experiments on an open-source data set. The final experimental results show that the guidance from the elite-level teacher models dramatically improves the student model's performance, which exceeds the full-band model by employing fewer parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题