论文标题

改善基于构象异构体的混合模型的训练配方

Improving the Training Recipe for a Robust Conformer-based Hybrid Model

论文作者

Zeineldeen, Mohammad, Xu, Jingjing, Lüscher, Christoph, Schlüter, Ralf, Ney, Hermann

论文摘要

演讲者的适应性对于建立强大的自动语音识别(ASR)系统很重要。在这项工作中,我们根据基于配置符的声学模型(AM)在300h数据集中的功能空间方法研究了扬声器自适应训练(SAT)的各种方法。我们提出了一种称为加权简单添加的方法,该方法将加权的说话者信息向量添加到构象异构体AM的多头自发动模块的输入中。使用此方法用于SAT,我们分别在HUB5'00和HUB5'01的Callhome部分方面取得了3.5%和4.5%的相对改善。此外,我们以先前的作品为基础,在此基础上,我们为基于构象异构体的混合动力AM提出了一种新颖的竞争培训配方。我们扩展并改善了此食谱,在该配方中,我们在300H HUB5'00数据集上的单词误差(WER)方面取得了11%的相对改善。我们还通过将参数总数减少34%,从而使该配方有效。

Speaker adaptation is important to build robust automatic speech recognition (ASR) systems. In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset. We propose a method, called Weighted-Simple-Add, which adds weighted speaker information vectors to the input of the multi-head self-attention module of the conformer AM. Using this method for SAT, we achieve 3.5% and 4.5% relative improvement in terms of WER on the CallHome part of Hub5'00 and Hub5'01 respectively. Moreover, we build on top of our previous work where we proposed a novel and competitive training recipe for a conformer-based hybrid AM. We extend and improve this recipe where we achieve 11% relative improvement in terms of word-error-rate (WER) on Switchboard 300h Hub5'00 dataset. We also make this recipe efficient by reducing the total number of parameters by 34% relative.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源