改善基于构象异构体的混合模型的训练配方

论文标题

改善基于构象异构体的混合模型的训练配方

Improving the Training Recipe for a Robust Conformer-based Hybrid Model

论文作者

Zeineldeen, Mohammad, Xu, Jingjing, Lüscher, Christoph, Schlüter, Ralf, Ney, Hermann

论文摘要

演讲者的适应性对于建立强大的自动语音识别（ASR）系统很重要。在这项工作中，我们根据基于配置符的声学模型（AM）在300h数据集中的功能空间方法研究了扬声器自适应训练（SAT）的各种方法。我们提出了一种称为加权简单添加的方法，该方法将加权的说话者信息向量添加到构象异构体AM的多头自发动模块的输入中。使用此方法用于SAT，我们分别在HUB5'00和HUB5'01的Callhome部分方面取得了3.5％和4.5％的相对改善。此外，我们以先前的作品为基础，在此基础上，我们为基于构象异构体的混合动力AM提出了一种新颖的竞争培训配方。我们扩展并改善了此食谱，在该配方中，我们在300H HUB5'00数据集上的单词误差（WER）方面取得了11％的相对改善。我们还通过将参数总数减少34％，从而使该配方有效。

Speaker adaptation is important to build robust automatic speech recognition (ASR) systems. In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset. We propose a method, called Weighted-Simple-Add, which adds weighted speaker information vectors to the input of the multi-head self-attention module of the conformer AM. Using this method for SAT, we achieve 3.5% and 4.5% relative improvement in terms of WER on the CallHome part of Hub5'00 and Hub5'01 respectively. Moreover, we build on top of our previous work where we proposed a novel and competitive training recipe for a conformer-based hybrid AM. We extend and improve this recipe where we achieve 11% relative improvement in terms of word-error-rate (WER) on Switchboard 300h Hub5'00 dataset. We also make this recipe efficient by reducing the total number of parameters by 34% relative.

下载PDF全文

下载文献需遵守相关版权规定

论文标题