多语言瓶颈功能，用于改善资源不足的语言的ASR表现的ASR性能

论文标题

多语言瓶颈功能，用于改善资源不足的语言的ASR表现的ASR性能

Multilingual Bottleneck Features for Improving ASR Performance of Code-Switched Speech in Under-Resourced Languages

论文作者

Padhi, Trideba, Biswas, Astik, De Wet, Febe, van der Westhuizen, Ewald, Niesler, Thomas

论文摘要

在这项工作中，我们探讨了在声学建模中使用多语言瓶颈功能（MBNF）的好处，以在非洲语言中自动对代码转换（CS）演讲的语音识别。在为这种资源不足的语音类型开发语音识别系统时，注释语言的注释语言一直是一个主要挑战。因此，值得调查使用其他资源更好的语言来改善语音识别性能的潜力。为了实现这一目标，我们使用九种构成自由使用的多语言NCHLT语料库的一部分的南部班图语训练MBNF提取器。我们将这些MBNF附加到现有的MFCC，音调功能和I-向量，以训练原动代码开关语言的自动语音识别（ASR）的声学模型。我们的结果表明，包含MBNF功能会导致对没有MBNF的基线进行明确的性能改进，而无需用于代码切换的英语isizizulu，English-Isixhosa，English-Isixhosa，English-Sesotho和English Setswana的演讲。

In this work, we explore the benefits of using multilingual bottleneck features (mBNF) in acoustic modelling for the automatic speech recognition of code-switched (CS) speech in African languages. The unavailability of annotated corpora in the languages of interest has always been a primary challenge when developing speech recognition systems for this severely under-resourced type of speech. Hence, it is worthwhile to investigate the potential of using speech corpora available for other better-resourced languages to improve speech recognition performance. To achieve this, we train a mBNF extractor using nine Southern Bantu languages that form part of the freely available multilingual NCHLT corpus. We append these mBNFs to the existing MFCCs, pitch features and i-vectors to train acoustic models for automatic speech recognition (ASR) in the target code-switched languages. Our results show that the inclusion of the mBNF features leads to clear performance improvements over a baseline trained without the mBNFs for code-switched English-isiZulu, English-isiXhosa, English-Sesotho and English-Setswana speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题