论文标题

用于代码转换语音识别的变压器变形器

Transformer-Transducers for Code-Switched Speech Recognition

论文作者

Dalmia, Siddharth, Liu, Yuzong, Ronanki, Srikanth, Kirchhoff, Katrin

论文摘要

我们生活在一个人口中有60%的人可以说两种或更多语言的世界。这些社区的成员进行对话时会在语言之间不断切换。随着自动语音识别(ASR)系统被部署到现实世界中,需要实用的系统,这些系统可以在话语中或话语中处理多种语言。在本文中,我们使用用于代码切换语音识别的变压器 - 变形器模型体系结构介绍了端到端的ASR系统。我们建议对香草模型进行三个修改,以处理代码转换的各个方面。首先,我们引入了两个辅助损失功能,以处理代码转换的低资源场景。其次,我们提出了一种基于语言ID信息的基于面具的新型训练策略,以改善标签编码器培训,用于句子内代码转换。最后,我们提出了一个多标签/多audio编码器结构,以利用庞大的单语音语言语料库来转换代码转换。我们在Seame数据集上证明了我们提出的方法的功效,SEAME数据集是公共通讯 - 英语密码转换语料库,分别在Test_man和test_sge集中达到了18.5%和26.3%的混合错误率。

We live in a world where 60% of the population can speak two or more languages fluently. Members of these communities constantly switch between languages when having a conversation. As automatic speech recognition (ASR) systems are being deployed to the real-world, there is a need for practical systems that can handle multiple languages both within an utterance or across utterances. In this paper, we present an end-to-end ASR system using a transformer-transducer model architecture for code-switched speech recognition. We propose three modifications over the vanilla model in order to handle various aspects of code-switching. First, we introduce two auxiliary loss functions to handle the low-resource scenario of code-switching. Second, we propose a novel mask-based training strategy with language ID information to improve the label encoder training towards intra-sentential code-switching. Finally, we propose a multi-label/multi-audio encoder structure to leverage the vast monolingual speech corpora towards code-switching. We demonstrate the efficacy of our proposed approaches on the SEAME dataset, a public Mandarin-English code-switching corpus, achieving a mixed error rate of 18.5% and 26.3% on test_man and test_sge sets respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源