无ASR发音评估

论文标题

无ASR发音评估

ASR-Free Pronunciation Assessment

论文作者

Cheng, Sitong, Liu, Zhixin, Li, Lantian, Tang, Zhiyuan, Wang, Dong, Zheng, Thomas Fang

论文摘要

大多数发音评估方法基于自动语音识别（ASR）得出的局部特征，例如发音的好处（GOP）得分。在本文中，我们研究了一种无ASR评分方法，该方法是从原始语音信号的边际分布中得出的。假设是，即使我们对语言不了解（因此无法识别手机/单词），我们仍然可以通过相比聆听来自目标语言的某些语音数据来说明发音有多好。我们的分析表明，这种新的评分方法为GOP的电话竞争问题提供了有趣的更正。 ERJ数据集的实验结果表明，将无ASR得分和GOP结合起来可以比GOP基线获得更好的性能。

Most of the pronunciation assessment methods are based on local features derived from automatic speech recognition (ASR), e.g., the Goodness of Pronunciation (GOP) score. In this paper, we investigate an ASR-free scoring approach that is derived from the marginal distribution of raw speech signals. The hypothesis is that even if we have no knowledge of the language (so cannot recognize the phones/words), we can still tell how good a pronunciation is, by comparatively listening to some speech data from the target language. Our analysis shows that this new scoring approach provides an interesting correction for the phone-competition problem of GOP. Experimental results on the ERJ dataset demonstrated that combining the ASR-free score and GOP can achieve better performance than the GOP baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题