子词rnnlm近似值，用于播放量的关键字搜索

论文标题

子词rnnlm近似值，用于播放量的关键字搜索

Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search

论文作者

Singh, Mittul, Virpioja, Sami, Smit, Peter, Kurimo, Mikko

论文摘要

在口语关键字搜索中，查询可能包含训练语音识别系统时未观察到的量量表（OOV）单词。使用首次识别中的子单词语言模型（LMS）可以识别OOV单词，但即使是子字n-gram lms也遭受了数据稀疏性的影响。复发性神经网络（RNN）LMS减轻了稀疏性问题，但不适合第一次识别。解决此问题的一种方法是通过向后的N-Gram模型近似RNNLM。在本文中，我们建议插入常规的N-Gram模型和RNNLM近似值，以获得更好的OOV识别。此外，我们开发了一种适用于子词单元的新的RNNLM近似方法：它产生可变阶n-grams，以包含长跨度近似值，并考虑最初在训练语料库中观察到的N-gram。为了在OOV上评估这些模型，我们将仅关注OOV单词的阿拉伯语和芬兰关键字搜索任务。在这些任务上，用单位字符子字的最大项加权值来插值基线RNNLM近似和常规LM的表现。此外，用所提出的方法代替基线近似值在多字符和单字符子词上都达到了最佳性能。

In spoken Keyword Search, the query may contain out-of-vocabulary (OOV) words not observed when training the speech recognition system. Using subword language models (LMs) in the first-pass recognition makes it possible to recognize the OOV words, but even the subword n-gram LMs suffer from data sparsity. Recurrent Neural Network (RNN) LMs alleviate the sparsity problems but are not suitable for first-pass recognition as such. One way to solve this is to approximate the RNNLMs by back-off n-gram models. In this paper, we propose to interpolate the conventional n-gram models and the RNNLM approximation for better OOV recognition. Furthermore, we develop a new RNNLM approximation method suitable for subword units: It produces variable-order n-grams to include long-span approximations and considers also n-grams that were not originally observed in the training corpus. To evaluate these models on OOVs, we setup Arabic and Finnish Keyword Search tasks concentrating only on OOV words. On these tasks, interpolating the baseline RNNLM approximation and a conventional LM outperforms the conventional LM in terms of the Maximum Term Weighted Value for single-character subwords. Moreover, replacing the baseline approximation with the proposed method achieves the best performance on both multi- and single-character subwords.

下载PDF全文

下载文献需遵守相关版权规定

论文标题