无监督的域适应语音识别，无监督错误校正

论文标题

无监督的域适应语音识别，无监督错误校正

Unsupervised domain adaptation for speech recognition with unsupervised error correction

论文作者

Mai, Long, Carson-Berndsen, Julie

论文摘要

自动语音识别（ASR）系统的转录质量在转录来自看不见的域的音频时会大大降低。我们为无监督的ASR域适应性提出了一种无监督的误差校正方法，旨在恢复域不匹配引起的转录误差。与依靠转录音频进行训练的现有校正方法不同，我们的方法仅需要针对目标域的未标记数据，在该数据中，将伪标记技术应用于生成校正培训样品。为了减少对伪数据的过度拟合，我们还提出了一个编码器校正模型，该模型可以考虑其他信息，例如对话上下文和声学特征。实验结果表明，我们的方法比未适应的ASR系统获得了显着的单词错误率（WER）。校正模型也可以在其他适应方法的基础上应用，以相对额外的改善。

The transcription quality of automatic speech recognition (ASR) systems degrades significantly when transcribing audios coming from unseen domains. We propose an unsupervised error correction method for unsupervised ASR domain adaption, aiming to recover transcription errors caused by domain mismatch. Unlike existing correction methods that rely on transcribed audios for training, our approach requires only unlabeled data of the target domains in which a pseudo-labeling technique is applied to generate correction training samples. To reduce over-fitting to the pseudo data, we also propose an encoder-decoder correction model that can take into account additional information such as dialogue context and acoustic features. Experiment results show that our method obtains a significant word error rate (WER) reduction over non-adapted ASR systems. The correction model can also be applied on top of other adaptation approaches to bring an additional improvement of 10% relatively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题