立体声网络：立体声音频质量预测变量转移从单inse-net学到

论文标题

立体声网络：立体声音频质量预测变量转移从单inse-net学到

Stereo InSE-NET: Stereo Audio Quality Predictor Transfer Learned from Mono InSE-NET

论文作者

Biswas, Arijit, Jiang, Guanxin

论文摘要

自动编码的音频质量预测变量通常用于评估单个通道而无需考虑任何空间方面。使用INSE-NET [1]，我们证明了具有深层神经网络（DNN）的最先进的编码音频质量指标（Visqol-V3 [2]），并随后通过编程生成的数据完全改进了它。在这项研究中，我们采取步骤来构建基于DNN的编码立体声音频质量预测器，并提出用于处理立体声信号的INSE-NET的扩展。该设计通过用左，右，中和侧通道调节模型来考虑立体/空间方面。我们命名了模型立体声inse-net。通过从预先训练的单INSE-NET中转移选定的权重，并通过实际和合成增强的听力测试进行重新训练，我们证明了Pearson和Spearman等级相关系数的12％和6％的显着改善，比最新的Visqol-V3 [3]。

Automatic coded audio quality predictors are typically designed for evaluating single channels without considering any spatial aspects. With InSE-NET [1], we demonstrated mimicking a state-of-the-art coded audio quality metric (ViSQOL-v3 [2]) with deep neural networks (DNN) and subsequently improving it - completely with programmatically generated data. In this study, we take steps towards building a DNN-based coded stereo audio quality predictor and we propose an extension of the InSE-NET for handling stereo signals. The design considers stereo/spatial aspects by conditioning the model with left, right, mid, and side channels; and we name our model Stereo InSE-NET. By transferring selected weights from the pre-trained mono InSE-NET and retraining with both real and synthetically augmented listening tests, we demonstrate a significant improvement of 12% and 6% of Pearson and Spearman Rank correlation coefficient, respectively, over the latest ViSQOL-v3 [3].

下载PDF全文

下载文献需遵守相关版权规定

论文标题