论文标题
NESC:强大的神经终端2端语音编码与gan
NESC: Robust Neural End-2-End Speech Coding with GANs
论文作者
论文摘要
事实证明,神经网络是以非常低的比特率解决语音编码问题的强大工具。但是,可以在现实情况下可以强大操作的神经编码器的设计仍然是一个重大挑战。因此,我们提出了神经终端2端语音编解码器(NESC),可用于3 kbps的高质量宽带语音编码的稳定,可扩展的端到端神经语音编解码器。编码器使用一种新的体系结构配置,该配置依赖于我们提出的双PATHCONVRNN(DPCRNN)层,而解码器体系结构则基于我们以前的Work streamwise-stylemelgan。我们对干净和嘈杂的演讲的主观听力测试表明,NESC对于看不见的条件和信号扰动特别强大。
Neural networks have proven to be a formidable tool to tackle the problem of speech coding at very low bit rates. However, the design of a neural coder that can be operated robustly under real-world conditions remains a major challenge. Therefore, we present Neural End-2-End Speech Codec (NESC) a robust, scalable end-to-end neural speech codec for high-quality wideband speech coding at 3 kbps. The encoder uses a new architecture configuration, which relies on our proposed Dual-PathConvRNN (DPCRNN) layer, while the decoder architecture is based on our previous work Streamwise-StyleMelGAN. Our subjective listening tests on clean and noisy speech show that NESC is particularly robust to unseen conditions and signal perturbations.