走向错误的神经语音编码

论文标题

走向错误的神经语音编码

Towards Error-Resilient Neural Speech Coding

论文作者

Xue, Huaying, Peng, Xiulian, Jiang, Xue, Lu, Yan

论文摘要

神经音频编码最近在文献中显示出非常有希望的结果，这在很大程度上要优于传统的编解码器，但对其错误弹性的关注有限。考虑仅源编码的训练的神经编解码器往往对通道噪声非常敏感，尤其是在具有高误差率的无线通道中。在本文中，我们调查了如何提高神经音频编解码器的错误弹性，以实现实时通信期间经常发生的数据包丢失。我们提出了一种用于实时神经语音编码的功能域数据包丢失隐藏算法（FD-PLC）。具体来说，我们在收到的潜在特征上引入了一个基于自发的模块，以在解码器之前恢复特征域中的丢失框架。使用混合段级别和帧级频率域歧视器来指导网络，以关注丢失框架的生成质量以及与相邻帧的连续性。几种误差模式的实验结果表明，与相应的无误差和误差基准相比，所提出的方案可以实现更好的鲁棒性。我们还表明，特征域的隐藏域优于波形 - 域域作为后处理。

Neural audio coding has shown very promising results recently in the literature to largely outperform traditional codecs but limited attention has been paid on its error resilience. Neural codecs trained considering only source coding tend to be extremely sensitive to channel noises, especially in wireless channels with high error rate. In this paper, we investigate how to elevate the error resilience of neural audio codecs for packet losses that often occur during real-time communications. We propose a feature-domain packet loss concealment algorithm (FD-PLC) for real-time neural speech coding. Specifically, we introduce a self-attention-based module on the received latent features to recover lost frames in the feature domain before the decoder. A hybrid segment-level and frame-level frequency-domain discriminator is employed to guide the network to focus on both the generative quality of lost frames and the continuity with neighbouring frames. Experimental results on several error patterns show that the proposed scheme can achieve better robustness compared with the corresponding error-free and error-resilient baselines. We also show that feature-domain concealment is superior to waveform-domain counterpart as post-processing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题