通过MOS使用非匹配参考的语音质量评估

论文标题

通过MOS使用非匹配参考的语音质量评估

Speech Quality Assessment through MOS using Non-Matching References

论文作者

Manocha, Pranay, Kumar, Anurag

论文摘要

通过平均意见分数（MOS）获得的人类判断是评估语音信号质量的最可靠方法。但是，最近几次尝试使用深度学习方法自动估算MOS的尝试缺乏鲁棒性和泛化功能，从而限制了它们在实际应用中的使用。在这项工作中，我们提出了一个新颖的框架Noresqa-Mos，用于估计语音信号的MOS。与先前的工作不同，我们的方法使用非匹配参考作为调理形式，以通过神经网络进行MOS估计。我们表明，尽管我们使用较小的训练集，但与以前的最新方法（例如DNSMOS和Nisqa）相比，NoresQA-MOS提供了更好的概括和更强的MOS估计。此外，我们还表明，我们的通用框架可以与其他学习方法（例如自学学习方法）结合使用，并可以进一步补充这些方法中的好处。

Human judgments obtained through Mean Opinion Scores (MOS) are the most reliable way to assess the quality of speech signals. However, several recent attempts to automatically estimate MOS using deep learning approaches lack robustness and generalization capabilities, limiting their use in real-world applications. In this work, we present a novel framework, NORESQA-MOS, for estimating the MOS of a speech signal. Unlike prior works, our approach uses non-matching references as a form of conditioning to ground the MOS estimation by neural networks. We show that NORESQA-MOS provides better generalization and more robust MOS estimation than previous state-of-the-art methods such as DNSMOS and NISQA, even though we use a smaller training set. Moreover, we also show that our generic framework can be combined with other learning methods such as self-supervised learning and can further supplement the benefits from these methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题