使用上下文化的CTC损失，在ASR中降低拼写不一致

论文标题

使用上下文化的CTC损失，在ASR中降低拼写不一致

Reducing Spelling Inconsistencies in Code-Switching ASR using Contextualized CTC Loss

论文作者

Naowarat, Burin, Kongthaworn, Thananchai, Karunratanakul, Korrawe, Wu, Sheng Hui, Chuangsuwanich, Ekapol

论文摘要

代码转换（CS）仍然是自动语音识别（ASR），尤其是基于角色的模型的挑战。通过从多种语言中的字符选择，基于字符的模型的结果遭受了音素重复，从而产生了语言不合格的拼写。我们提出了上下文化的连接派时间分类（CCTC）损失，以鼓励基于角色的非自动性ASR的拼写一致性，从而可以更快地推断。 CCTC损失条件在预测上下文上的主要预测，以确保拼写中的语言一致性。与现有的基于CTC的方法相反，CCTC损失不需要帧级比对，因为上下文接地真相是从模型的估计路径中获得的。与经过常规CTC损失训练的同一模型相比，我们的方法一致地提高了CS和单语言语料库的ASR性能。

Code-Switching (CS) remains a challenge for Automatic Speech Recognition (ASR), especially character-based models. With the combined choice of characters from multiple languages, the outcome from character-based models suffers from phoneme duplication, resulting in language-inconsistent spellings. We propose Contextualized Connectionist Temporal Classification (CCTC) loss to encourage spelling consistencies of a character-based non-autoregressive ASR which allows for faster inference. The CCTC loss conditions the main prediction on the predicted contexts to ensure language consistency in the spellings. In contrast to existing CTC-based approaches, CCTC loss does not require frame-level alignments, since the context ground truth is obtained from the model's estimated path. Compared to the same model trained with regular CTC loss, our method consistently improved the ASR performance on both CS and monolingual corpora.

下载PDF全文

下载文献需遵守相关版权规定

论文标题