论文标题

多语言瓶颈功能,用于改善资源不足的语言的ASR表现的ASR性能

Multilingual Bottleneck Features for Improving ASR Performance of Code-Switched Speech in Under-Resourced Languages

论文作者

Padhi, Trideba, Biswas, Astik, De Wet, Febe, van der Westhuizen, Ewald, Niesler, Thomas

论文摘要

在这项工作中,我们探讨了在声学建模中使用多语言瓶颈功能(MBNF)的好处,以在非洲语言中自动对代码转换(CS)演讲的语音识别。在为这种资源不足的语音类型开发语音识别系统时,注释语言的注释语言一直是一个主要挑战。因此,值得调查使用其他资源更好的语言来改善语音识别性能的潜力。为了实现这一目标,我们使用九种构成自由使用的多语言NCHLT语料库的一部分的南部班图语训练MBNF提取器。我们将这些MBNF附加到现有的MFCC,音调功能和I-向量,以训练原动代码开关语言的自动语音识别(ASR)的声学模型。我们的结果表明,包含MBNF功能会导致对没有MBNF的基线进行明确的性能改进,而无需用于代码切换的英语isizizulu,English-Isixhosa,English-Isixhosa,English-Sesotho和English Setswana的演讲。

In this work, we explore the benefits of using multilingual bottleneck features (mBNF) in acoustic modelling for the automatic speech recognition of code-switched (CS) speech in African languages. The unavailability of annotated corpora in the languages of interest has always been a primary challenge when developing speech recognition systems for this severely under-resourced type of speech. Hence, it is worthwhile to investigate the potential of using speech corpora available for other better-resourced languages to improve speech recognition performance. To achieve this, we train a mBNF extractor using nine Southern Bantu languages that form part of the freely available multilingual NCHLT corpus. We append these mBNFs to the existing MFCCs, pitch features and i-vectors to train acoustic models for automatic speech recognition (ASR) in the target code-switched languages. Our results show that the inclusion of the mBNF features leads to clear performance improvements over a baseline trained without the mBNFs for code-switched English-isiZulu, English-isiXhosa, English-Sesotho and English-Setswana speech.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源