南非呼叫中心音频的多式培训

论文标题

南非呼叫中心音频的多式培训

Multi-style Training for South African Call Centre Audio

论文作者

Heymans, Walter, Davel, Marelie H., van Heerden, Charl

论文摘要

对于自动语音识别（ASR）系统，不匹配的数据是一个具有挑战性的问题。用于解决不匹配数据的最常见技术之一是多式训练（MTR），这是一种数据增强的形式，它试图将培训数据转换为更具测试数据的代表性。并学习适用于不同条件的强大表示形式。如果测试条件未知，此任务可能非常具有挑战性。在深度神经网络隐藏的马尔可夫模型（DNN-HMM）ASR系统的背景下，我们探讨了不同MTR样式对系统性能的影响。使用Librispeech语料库创建了一个受控的环境，在该环境中，我们隔离了不同的MTR样式对最终系统性能的影响。我们在包含嘈杂的WAV49编码音频的南非呼叫中心数据集上评估了我们的发现。

Mismatched data is a challenging problem for automatic speech recognition (ASR) systems. One of the most common techniques used to address mismatched data is multi-style training (MTR), a form of data augmentation that attempts to transform the training data to be more representative of the testing data; and to learn robust representations applicable to different conditions. This task can be very challenging if the test conditions are unknown. We explore the impact of different MTR styles on system performance when testing conditions are different from training conditions in the context of deep neural network hidden Markov model (DNN-HMM) ASR systems. A controlled environment is created using the LibriSpeech corpus, where we isolate the effect of different MTR styles on final system performance. We evaluate our findings on a South African call centre dataset that contains noisy, WAV49-encoded audio.

下载PDF全文

下载文献需遵守相关版权规定

论文标题