关于基于神经文本生成的数据增强的有效性，以识别形态丰富的语音

论文标题

关于基于神经文本生成的数据增强的有效性，以识别形态丰富的语音

On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech

论文作者

Tarján, Balázs, Szaszák, György, Fegyó, Tibor, Mihajlik, Péter

论文摘要

近年来，先进的神经网络模型已渗透自动语音识别（ASR），但是，在语言建模中，许多系统仍依赖传统的后退N-Gram语言模型（BNLM）部分或全部是部分或全部。这样做的原因是培训的高成本和复杂性和使用神经语言模型，主要是通过添加第二次解码通行证（回顾）。在我们最近的工作中，我们通过将知识从经常性的神经网络语言模型（RNNLM）转移到单个Pass BNLM，并通过基于文本生成的数据增强来大大提高了对话语音转录系统的在线性能。在本文中，我们分析了可转移的知识的数量，并证明了神经增强LM（RNN-BNLM）可以通过删除第二个解码通行证并使系统实时功能来帮助捕获几乎50％的RNNLM知识。我们还系统地比较了单词和子字LMS，并表明基于子词的神经文本增强在资源不足的条件下可能特别有益。此外，我们表明，在第一张通过，然后是神经第二遍，即使在线ASR结果可以显着改善。

Advanced neural network models have penetrated Automatic Speech Recognition (ASR) in recent years, however, in language modeling many systems still rely on traditional Back-off N-gram Language Models (BNLM) partly or entirely. The reason for this are the high cost and complexity of training and using neural language models, mostly possible by adding a second decoding pass (rescoring). In our recent work we have significantly improved the online performance of a conversational speech transcription system by transferring knowledge from a Recurrent Neural Network Language Model (RNNLM) to the single pass BNLM with text generation based data augmentation. In the present paper we analyze the amount of transferable knowledge and demonstrate that the neural augmented LM (RNN-BNLM) can help to capture almost 50% of the knowledge of the RNNLM yet by dropping the second decoding pass and making the system real-time capable. We also systematically compare word and subword LMs and show that subword-based neural text augmentation can be especially beneficial in under-resourced conditions. In addition, we show that using the RNN-BNLM in the first pass followed by a neural second pass, offline ASR results can be even significantly improved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题