论文标题
有效培训神经传感器的语音识别
Efficient Training of Neural Transducer for Speech Recognition
论文作者
论文摘要
作为语音识别的最流行的序列建模方法之一,RNN-Transducer通过越来越复杂的神经网络模型的增长和增加训练时期的训练时代,实现了不断发展的性能。尽管强大的计算资源似乎是培训卓越模型的先决条件,但我们试图通过仔细设计更有效的培训管道来克服它。在这项工作中,我们提出了一条有效的三阶段渐进式训练管道,以在合理的短时间内从头开始建立高表现的神经传感器模型,并具有非常有限的计算资源。每个阶段的有效性在LibrisPeech和Convebobly Corpola上都经过实验验证。拟议的管道能够在短短2-3周内训练以单个GPU接近最先进性能的换能器模型。我们最好的构型传感器在Librispeech测试中获得了4.1%的速度,只有35个训练。
As one of the most popular sequence-to-sequence modeling approaches for speech recognition, the RNN-Transducer has achieved evolving performance with more and more sophisticated neural network models of growing size and increasing training epochs. While strong computation resources seem to be the prerequisite of training superior models, we try to overcome it by carefully designing a more efficient training pipeline. In this work, we propose an efficient 3-stage progressive training pipeline to build highly-performing neural transducer models from scratch with very limited computation resources in a reasonable short time period. The effectiveness of each stage is experimentally verified on both Librispeech and Switchboard corpora. The proposed pipeline is able to train transducer models approaching state-of-the-art performance with a single GPU in just 2-3 weeks. Our best conformer transducer achieves 4.1% WER on Librispeech test-other with only 35 epochs of training.