论文标题

FeatherWave:具有多波段线性预测的有效高保真神经声码编码器

FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction

论文作者

Tian, Qiao, Zhang, Zewang, Lu, Heng, Chen, Ling-Hui, Liu, Shan

论文摘要

在本文中,我们提出了FeatherWave,这是Wavernn Vocoder的另一种变体,结合了多波段信号处理和线性预测编码。 LPCNET是最近提出的神经声码器,在Wavernn架构中利用语音信号的线性预测特征​​,可以在单个CPU核心上以比实时更快的速度生成高质量的语音。但是,LPCNET对于在线语音生成任务仍然不够有效。为了解决这个问题,我们采用了Wavernn Vocoder的多波段线性预测编码。多带方法使模型可以一步一步地生成几个语音样本。因此,它可以显着提高语音合成的效率。带有4个子带的拟议模型需要小于1.6 Gflops的语音生成。在我们的实验中,它可以在单个CPU上快速生成24 kHz高保真音频9倍,该音频比LPCNET Vocoder快得多。此外,我们的主观听力测试表明,羽毛波可以比LPCNET产生质量更好的语音。

In this paper, we propose the FeatherWave, yet another variant of WaveRNN vocoder combining the multi-band signal processing and the linear predictive coding. The LPCNet, a recently proposed neural vocoder which utilized the linear predictive characteristic of speech signal in the WaveRNN architecture, can generate high quality speech with a speed faster than real-time on a single CPU core. However, LPCNet is still not efficient enough for online speech generation tasks. To address this issue, we adopt the multi-band linear predictive coding for WaveRNN vocoder. The multi-band method enables the model to generate several speech samples in parallel at one step. Therefore, it can significantly improve the efficiency of speech synthesis. The proposed model with 4 sub-bands needs less than 1.6 GFLOPS for speech generation. In our experiments, it can generate 24 kHz high-fidelity audio 9x faster than real-time on a single CPU, which is much faster than the LPCNet vocoder. Furthermore, our subjective listening test shows that the FeatherWave can generate speech with better quality than LPCNet.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源