使用语音清晰度专家混合的失语性语音识别

论文标题

使用语音清晰度专家混合的失语性语音识别

Aphasic Speech Recognition using a Mixture of Speech Intelligibility Experts

论文作者

Perez, Matthew, Aldeneh, Zakaria, Provost, Emily Mower

论文摘要

强大的语音识别是自动失语性语音分析中语义特征提取的关键先决条件。但是，当应用于失语性语音时，标准的一定大小的自动语音识别模型的表现较差。原因之一是由于不同程度的严重程度（即更高的严重程度使语音较低的语音）引起的语音清晰度范围很广。为了解决这个问题，我们提出了一个基于专家（MOE）混合的新型声学模型，该模型通过明确定义基于严重性的专家的明确定义，该专家（MOE）处理了失语性演讲中存在的不同清晰度阶段。在测试时，每个专家的贡献是通过用语音清晰度探测器（SID）估算语音清晰度来决定的。我们表明，与没有将严重性信息纳入建模过程的基线方法相比，我们提出的方法在失语性语音中大大降低了所有严重性阶段的电话错误率。

Robust speech recognition is a key prerequisite for semantic feature extraction in automatic aphasic speech analysis. However, standard one-size-fits-all automatic speech recognition models perform poorly when applied to aphasic speech. One reason for this is the wide range of speech intelligibility due to different levels of severity (i.e., higher severity lends itself to less intelligible speech). To address this, we propose a novel acoustic model based on a mixture of experts (MoE), which handles the varying intelligibility stages present in aphasic speech by explicitly defining severity-based experts. At test time, the contribution of each expert is decided by estimating speech intelligibility with a speech intelligibility detector (SID). We show that our proposed approach significantly reduces phone error rates across all severity stages in aphasic speech compared to a baseline approach that does not incorporate severity information into the modeling process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题