论文标题

基于音素的神经传感器的无晶格序列判别训练

Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

论文作者

Yang, Zijian, Zhou, Wei, Schlüter, Ralf, Ney, Hermann

论文摘要

最近,RNN-Transducer在各种自动语音识别任务上取得了显着的结果。然而,在RNN-Transducer中很少研究无晶格序列判别训练方法,这些方法在混合模型中获得了卓越的性能。在这项工作中,我们提出了三个无晶格的训练目标,即无晶格的最大互信息,无晶格的片段级最小贝叶斯风险和无晶格的最小贝叶斯风险,这些风险用于基于音调的神经传感器的最终后输出,具有有限的上下文依赖性。与使用N-最佳列表的标准相比,无晶格方法消除了训练过程中假设生成的解码步骤,从而导致更有效的训练。实验结果表明,与经过序列级别的跨透镜训练模型相比,无晶格方法的单词错误率相对相对提高了6.5%。与基于N最大列表的最小贝叶斯风险目标相比,无晶格方法的相对训练时间速度可获得40%-70%的相对训练时间,并且性能较小。

Recently, RNN-Transducers have achieved remarkable results on various automatic speech recognition tasks. However, lattice-free sequence discriminative training methods, which obtain superior performance in hybrid models, are rarely investigated in RNN-Transducers. In this work, we propose three lattice-free training objectives, namely lattice-free maximum mutual information, lattice-free segment-level minimum Bayes risk, and lattice-free minimum Bayes risk, which are used for the final posterior output of the phoneme-based neural transducer with a limited context dependency. Compared to criteria using N-best lists, lattice-free methods eliminate the decoding step for hypotheses generation during training, which leads to more efficient training. Experimental results show that lattice-free methods gain up to 6.5% relative improvement in word error rate compared to a sequence-level cross-entropy trained model. Compared to the N-best-list based minimum Bayes risk objectives, lattice-free methods gain 40% - 70% relative training time speedup with a small degradation in performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源