论文标题
FeatherWave:具有多波段线性预测的有效高保真神经声码编码器
FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction
论文作者
论文摘要
在本文中,我们提出了FeatherWave,这是Wavernn Vocoder的另一种变体,结合了多波段信号处理和线性预测编码。 LPCNET是最近提出的神经声码器,在Wavernn架构中利用语音信号的线性预测特征,可以在单个CPU核心上以比实时更快的速度生成高质量的语音。但是,LPCNET对于在线语音生成任务仍然不够有效。为了解决这个问题,我们采用了Wavernn Vocoder的多波段线性预测编码。多带方法使模型可以一步一步地生成几个语音样本。因此,它可以显着提高语音合成的效率。带有4个子带的拟议模型需要小于1.6 Gflops的语音生成。在我们的实验中,它可以在单个CPU上快速生成24 kHz高保真音频9倍,该音频比LPCNET Vocoder快得多。此外,我们的主观听力测试表明,羽毛波可以比LPCNET产生质量更好的语音。
In this paper, we propose the FeatherWave, yet another variant of WaveRNN vocoder combining the multi-band signal processing and the linear predictive coding. The LPCNet, a recently proposed neural vocoder which utilized the linear predictive characteristic of speech signal in the WaveRNN architecture, can generate high quality speech with a speed faster than real-time on a single CPU core. However, LPCNet is still not efficient enough for online speech generation tasks. To address this issue, we adopt the multi-band linear predictive coding for WaveRNN vocoder. The multi-band method enables the model to generate several speech samples in parallel at one step. Therefore, it can significantly improve the efficiency of speech synthesis. The proposed model with 4 sub-bands needs less than 1.6 GFLOPS for speech generation. In our experiments, it can generate 24 kHz high-fidelity audio 9x faster than real-time on a single CPU, which is much faster than the LPCNet vocoder. Furthermore, our subjective listening test shows that the FeatherWave can generate speech with better quality than LPCNet.