基于音素的神经传感器的无晶格序列判别训练

论文标题

基于音素的神经传感器的无晶格序列判别训练

Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

论文作者

Yang, Zijian, Zhou, Wei, Schlüter, Ralf, Ney, Hermann

论文摘要

最近，RNN-Transducer在各种自动语音识别任务上取得了显着的结果。然而，在RNN-Transducer中很少研究无晶格序列判别训练方法，这些方法在混合模型中获得了卓越的性能。在这项工作中，我们提出了三个无晶格的训练目标，即无晶格的最大互信息，无晶格的片段级最小贝叶斯风险和无晶格的最小贝叶斯风险，这些风险用于基于音调的神经传感器的最终后输出，具有有限的上下文依赖性。与使用N-最佳列表的标准相比，无晶格方法消除了训练过程中假设生成的解码步骤，从而导致更有效的训练。实验结果表明，与经过序列级别的跨透镜训练模型相比，无晶格方法的单词错误率相对相对提高了6.5％。与基于N最大列表的最小贝叶斯风险目标相比，无晶格方法的相对训练时间速度可获得40％-70％的相对训练时间，并且性能较小。

Recently, RNN-Transducers have achieved remarkable results on various automatic speech recognition tasks. However, lattice-free sequence discriminative training methods, which obtain superior performance in hybrid models, are rarely investigated in RNN-Transducers. In this work, we propose three lattice-free training objectives, namely lattice-free maximum mutual information, lattice-free segment-level minimum Bayes risk, and lattice-free minimum Bayes risk, which are used for the final posterior output of the phoneme-based neural transducer with a limited context dependency. Compared to criteria using N-best lists, lattice-free methods eliminate the decoding step for hypotheses generation during training, which leads to more efficient training. Experimental results show that lattice-free methods gain up to 6.5% relative improvement in word error rate compared to a sequence-level cross-entropy trained model. Compared to the N-best-list based minimum Bayes risk objectives, lattice-free methods gain 40% - 70% relative training time speedup with a small degradation in performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题