ASAPP-ASR：SOTA语音识别

论文标题

ASAPP-ASR：SOTA语音识别

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition

论文作者

Pan, Jing, Shapiro, Joshua, Wohlwend, Jeremy, Han, Kyu J., Lei, Tao, Ma, Tao

论文摘要

在本文中，我们在Librispeech语料库中介绍了具有两个新型神经网络体系结构，一种用于声学建模的多式CNN和一个用于语言建模的自我攻击的简单复发单元（SRU）。在混合ASR框架中，多发达CNN声学模型在多个平行管道中处理语音框架的输入，在多个平行管道中，每个流的多样性都有唯一的扩张率。经过规格数据增强方法培训，它在测试清洁中获得了相对单词错误率（WER）的相对单词错误率（WER）的改善，而测试中的相对单词错误率（WER）提高了14％。我们通过使用24层自动练习的SRU语言模型来进一步提高N-t-t-t-t-t-t-t-t-t-t-t-tess恢复性能，在测试清洁中获得1.75％的WERS和4.46％的测试效果。

In this paper we present state-of-the-art (SOTA) performance on the LibriSpeech corpus with two novel neural network architectures, a multistream CNN for acoustic modeling and a self-attentive simple recurrent unit (SRU) for language modeling. In the hybrid ASR framework, the multistream CNN acoustic model processes an input of speech frames in multiple parallel pipelines where each stream has a unique dilation rate for diversity. Trained with the SpecAugment data augmentation method, it achieves relative word error rate (WER) improvements of 4% on test-clean and 14% on test-other. We further improve the performance via N-best rescoring using a 24-layer self-attentive SRU language model, achieving WERs of 1.75% on test-clean and 4.46% on test-other.

下载PDF全文

下载文献需遵守相关版权规定

论文标题