使用本地和全球对数线性组合的早期LM集成

论文标题

使用本地和全球对数线性组合的早期LM集成

Early Stage LM Integration Using Local and Global Log-Linear Combination

论文作者

Michel, Wilfried, Schlüter, Ralf, Ney, Hermann

论文摘要

具有隐式对准机制（例如注意力）的序列到序列模型正在缩小针对传统混合隐藏的马尔可夫模型（HMM）的性能差距，以实现自动语音识别的任务。在这两种情况下，提高单词错误率的一个重要因素是使用对大型文本语料库进行培训的外部语言模型（LM）。语言模型集成在基于经典HMM的建模中的声学模型和语言模型的明确分离是直接的。相反，已经提出了针对注意模型的多个集成方案。在这项工作中，我们提出了一种新颖的方法，将语言模型集成到基于隐式对准的序列到序列模型中。声音和语言模型的对数线性模型组合是通过串联的重新归一化来执行的。这使我们能够在训练和测试中有效地计算完整的归一化项。将其与全球重新归一化方案进行了比较，该方案等同于在培训中应用浅融合。提出的方法比我们最先进的Librispeech系统上的标准模型组合（浅融合）显示出良好的改进。此外，即使LM在训练后换了更强大的LM，这些改进也是持久的。

Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM) for the task of automatic speech recognition. One important factor to improve word error rate in both cases is the use of an external language model (LM) trained on large text-only corpora. Language model integration is straightforward with the clear separation of acoustic model and language model in classical HMM-based modeling. In contrast, multiple integration schemes have been proposed for attention models. In this work, we present a novel method for language model integration into implicit-alignment based sequence-to-sequence models. Log-linear model combination of acoustic and language model is performed with a per-token renormalization. This allows us to compute the full normalization term efficiently both in training and in testing. This is compared to a global renormalization scheme which is equivalent to applying shallow fusion in training. The proposed methods show good improvements over standard model combination (shallow fusion) on our state-of-the-art Librispeech system. Furthermore, the improvements are persistent even if the LM is exchanged for a more powerful one after training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题