用于代码转换语音识别的变压器变形器

论文标题

用于代码转换语音识别的变压器变形器

Transformer-Transducers for Code-Switched Speech Recognition

论文作者

Dalmia, Siddharth, Liu, Yuzong, Ronanki, Srikanth, Kirchhoff, Katrin

论文摘要

我们生活在一个人口中有60％的人可以说两种或更多语言的世界。这些社区的成员进行对话时会在语言之间不断切换。随着自动语音识别（ASR）系统被部署到现实世界中，需要实用的系统，这些系统可以在话语中或话语中处理多种语言。在本文中，我们使用用于代码切换语音识别的变压器 - 变形器模型体系结构介绍了端到端的ASR系统。我们建议对香草模型进行三个修改，以处理代码转换的各个方面。首先，我们引入了两个辅助损失功能，以处理代码转换的低资源场景。其次，我们提出了一种基于语言ID信息的基于面具的新型训练策略，以改善标签编码器培训，用于句子内代码转换。最后，我们提出了一个多标签/多audio编码器结构，以利用庞大的单语音语言语料库来转换代码转换。我们在Seame数据集上证明了我们提出的方法的功效，SEAME数据集是公共通讯 - 英语密码转换语料库，分别在Test_man和test_sge集中达到了18.5％和26.3％的混合错误率。

We live in a world where 60% of the population can speak two or more languages fluently. Members of these communities constantly switch between languages when having a conversation. As automatic speech recognition (ASR) systems are being deployed to the real-world, there is a need for practical systems that can handle multiple languages both within an utterance or across utterances. In this paper, we present an end-to-end ASR system using a transformer-transducer model architecture for code-switched speech recognition. We propose three modifications over the vanilla model in order to handle various aspects of code-switching. First, we introduce two auxiliary loss functions to handle the low-resource scenario of code-switching. Second, we propose a novel mask-based training strategy with language ID information to improve the label encoder training towards intra-sentential code-switching. Finally, we propose a multi-label/multi-audio encoder structure to leverage the vast monolingual speech corpora towards code-switching. We demonstrate the efficacy of our proposed approaches on the SEAME dataset, a public Mandarin-English code-switching corpus, achieving a mixed error rate of 18.5% and 26.3% on test_man and test_sge sets respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题