论文标题
ddsupport:语言学习支持系统,显示与模型语音的差异和距离
DDSupport: Language Learning Support System that Displays Differences and Distances from Model Speech
论文作者
论文摘要
当初学者学会说非本地语言时,他们很难自己判断他们的讲话是否很好。因此,使用计算机辅助的发音训练系统来检测学习者的错误发音。这些系统通常将用户的语音与特定的母语者的语音作为节奏,音素或单词单位的模型进行比较,并计算差异。但是,它们需要大量的语音数据,并具有详细的注释,或者只能与一位特定的母语者进行比较。为了克服这些问题,我们提出了一个新的语言学习支持系统,该系统根据少量未经注释的语音数据来计算语音分数并检测到初学者的错误发音,而无需与特定的人进行比较。提出的系统使用基于学习的语音处理来显示学习者语音的发音评分以及学习者和一组模型以直观视觉方式的发音之间的差异/距离。学习者可以通过消除差异并缩短与模型的距离来逐渐改善发音,直到他们充分熟练。此外,由于与特定模型的特定句子相比,未计算发音评分和差异/距离,因此用户可以自由研究他们希望研究的句子。我们还建立了一个应用程序,以帮助非母语说话者学习英语,并确认它可以提高用户的语音清晰度。
When beginners learn to speak a non-native language, it is difficult for them to judge for themselves whether they are speaking well. Therefore, computer-assisted pronunciation training systems are used to detect learner mispronunciations. These systems typically compare the user's speech with that of a specific native speaker as a model in units of rhythm, phonemes, or words and calculate the differences. However, they require extensive speech data with detailed annotations or can only compare with one specific native speaker. To overcome these problems, we propose a new language learning support system that calculates speech scores and detects mispronunciations by beginners based on a small amount of unannotated speech data without comparison to a specific person. The proposed system uses deep learning--based speech processing to display the pronunciation score of the learner's speech and the difference/distance between the learner's and a group of models' pronunciation in an intuitively visual manner. Learners can gradually improve their pronunciation by eliminating differences and shortening the distance from the model until they become sufficiently proficient. Furthermore, since the pronunciation score and difference/distance are not calculated compared to specific sentences of a particular model, users are free to study the sentences they wish to study. We also built an application to help non-native speakers learn English and confirmed that it can improve users' speech intelligibility.