通过自定义MWER损耗标准来改善端到端ASR的适当名词识别

论文标题

通过自定义MWER损耗标准来改善端到端ASR的适当名词识别

Improving Proper Noun Recognition in End-to-End ASR By Customization of the MWER Loss Criterion

论文作者

Peyser, Cal, Sainath, Tara N., Pundak, Golan

论文摘要

专有名词对端到端（E2E）自动语音识别（ASR）系统提出了一个挑战，即特定名称可能仅在训练期间出现很少出现，并且可能具有与更常见的单词相似的发音。与传统的ASR模型不同，E2E系统缺乏明确的发音模型，该模型可以通过适当的名词发音和可以在大型仅文本语料库进行训练的语言模型进行专门训练。过去的工作通过合并其他培训数据或其他模型来解决此问题。在本文中，我们基于最低单词错误率（MWER）培训的最新进展，以制定两个新的损失标准，这些损失标准特别强调了适当的名词识别。与过去在此问题上的工作不同，此方法在推理期间不需要新的数据或外部模型。我们看到，几个相关基准的相对相对的相对范围从2％到7％。

Proper nouns present a challenge for end-to-end (E2E) automatic speech recognition (ASR) systems in that a particular name may appear only rarely during training, and may have a pronunciation similar to that of a more common word. Unlike conventional ASR models, E2E systems lack an explicit pronounciation model that can be specifically trained with proper noun pronounciations and a language model that can be trained on a large text-only corpus. Past work has addressed this issue by incorporating additional training data or additional models. In this paper, we instead build on recent advances in minimum word error rate (MWER) training to develop two new loss criteria that specifically emphasize proper noun recognition. Unlike past work on this problem, this method requires no new data during training or external models during inference. We see improvements ranging from 2% to 7% relative on several relevant benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题