论文标题
具有长期历史的LSTM-LM,用于对话语音识别中的第一级解码
LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition
论文作者
论文摘要
LSTM语言模型(LSTM-LMS)已被证明是强大的,并且在现代语音识别系统中基于计数的N-gram LMS进行了重大的性能改进。由于其无限的历史状态和计算负载,大多数先前的研究都集中在第二通通中应用LSTM-LMS进行重新纠正。最近的工作表明,在动态(基于树)的解码器框架中采用LSTM-LMS在第一频道解码中采用LSTM-LMS是可行的和计算负担得起的。在这项工作中,LSTM-LM由WFST解码器组成,用于首次解码。此外,还以LSTM-LMS的长期历史性质的启发,探索了当前话语以外的上下文的使用,以进行对话性语音识别中的第一频道解码。上下文信息由LSTM-LMS的隐藏状态捕获,可以用来有效地指导首次搜索。我们的内部会议转录系统的实验结果表明,与将上下文信息相比,将上下文信息与LSTM-LMS合并到LSTM-LMS中,可以通过在第二频繁的抛弃中应用上下文信息来获得显着的性能改进。
LSTM language models (LSTM-LMs) have been proven to be powerful and yielded significant performance improvements over count based n-gram LMs in modern speech recognition systems. Due to its infinite history states and computational load, most previous studies focus on applying LSTM-LMs in the second-pass for rescoring purpose. Recent work shows that it is feasible and computationally affordable to adopt the LSTM-LMs in the first-pass decoding within a dynamic (or tree based) decoder framework. In this work, the LSTM-LM is composed with a WFST decoder on-the-fly for the first-pass decoding. Furthermore, motivated by the long-term history nature of LSTM-LMs, the use of context beyond the current utterance is explored for the first-pass decoding in conversational speech recognition. The context information is captured by the hidden states of LSTM-LMs across utterance and can be used to guide the first-pass search effectively. The experimental results in our internal meeting transcription system show that significant performance improvements can be obtained by incorporating the contextual information with LSTM-LMs in the first-pass decoding, compared to applying the contextual information in the second-pass rescoring.