FeatherWave：具有多波段线性预测的有效高保真神经声码编码器

论文标题

FeatherWave：具有多波段线性预测的有效高保真神经声码编码器

FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction

论文作者

Tian, Qiao, Zhang, Zewang, Lu, Heng, Chen, Ling-Hui, Liu, Shan

论文摘要

在本文中，我们提出了FeatherWave，这是Wavernn Vocoder的另一种变体，结合了多波段信号处理和线性预测编码。 LPCNET是最近提出的神经声码器，在Wavernn架构中利用语音信号的线性预测特征，可以在单个CPU核心上以比实时更快的速度生成高质量的语音。但是，LPCNET对于在线语音生成任务仍然不够有效。为了解决这个问题，我们采用了Wavernn Vocoder的多波段线性预测编码。多带方法使模型可以一步一步地生成几个语音样本。因此，它可以显着提高语音合成的效率。带有4个子带的拟议模型需要小于1.6 Gflops的语音生成。在我们的实验中，它可以在单个CPU上快速生成24 kHz高保真音频9倍，该音频比LPCNET Vocoder快得多。此外，我们的主观听力测试表明，羽毛波可以比LPCNET产生质量更好的语音。

In this paper, we propose the FeatherWave, yet another variant of WaveRNN vocoder combining the multi-band signal processing and the linear predictive coding. The LPCNet, a recently proposed neural vocoder which utilized the linear predictive characteristic of speech signal in the WaveRNN architecture, can generate high quality speech with a speed faster than real-time on a single CPU core. However, LPCNet is still not efficient enough for online speech generation tasks. To address this issue, we adopt the multi-band linear predictive coding for WaveRNN vocoder. The multi-band method enables the model to generate several speech samples in parallel at one step. Therefore, it can significantly improve the efficiency of speech synthesis. The proposed model with 4 sub-bands needs less than 1.6 GFLOPS for speech generation. In our experiments, it can generate 24 kHz high-fidelity audio 9x faster than real-time on a single CPU, which is much faster than the LPCNet vocoder. Furthermore, our subjective listening test shows that the FeatherWave can generate speech with better quality than LPCNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题