大量多语言ASR：50种语言，1型，10亿个参数

论文标题

大量多语言ASR：50种语言，1型，10亿个参数

Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters

论文作者

Pratap, Vineel, Sriram, Anuroop, Tomasello, Paden, Hannun, Awni, Liptchinsky, Vitaliy, Synnaeve, Gabriel, Collobert, Ronan

论文摘要

我们研究了多种语言的单个声学模型，目的是改善低资源语言上的自动语音识别（ASR）性能，并简化所有支持支持各种语言的ASR系统的部署。我们对51种语言进行广泛的基准，并通过语言（从100小时到1100小时）进行不同数量的培训数据。我们将三种多语言培训的变体从单个联合模型进行比较，而不知道输入语言，使用此信息，多个头（每个语言群集）。我们表明，在几种语言上对ASR模型的多语言培训可以提高识别性能，尤其是在低资源语言上。与联合模型上的单语基线，具有语言输入和多头模型的联合模型相比，我们看到20.9％，23％和28.8％的平均相对相对降低。据我们所知，这是第一批大规模研究多语言ASR的工作，其中有50多种语言和超过16,000个小时的音频。

We study training a single acoustic model for multiple languages with the aim of improving automatic speech recognition (ASR) performance on low-resource languages, and over-all simplifying deployment of ASR systems that support diverse languages. We perform an extensive benchmark on 51 languages, with varying amount of training data by language(from 100 hours to 1100 hours). We compare three variants of multilingual training from a single joint model without knowing the input language, to using this information, to multiple heads (one per language cluster). We show that multilingual training of ASR models on several languages can improve recognition performance, in particular, on low resource languages. We see 20.9%, 23% and 28.8% average WER relative reduction compared to monolingual baselines on joint model, joint model with language input and multi head model respectively. To our knowledge, this is the first work studying multilingual ASR at massive scale, with more than 50 languages and more than 16,000 hours of audio across them.

下载PDF全文

下载文献需遵守相关版权规定

论文标题