论文标题
关注当前:ASR源目标注意层的正则化方法
Focus on the present: a regularization method for the ASR source-target attention layer
论文作者
论文摘要
本文介绍了一种新颖的方法,用于诊断具有联合连接主义时间分类(CTC)和注意力训练的最新端到端语音识别模型中的源目标注意力。我们的方法基于以下事实:CTC和源目标的注意都在相同的编码器表示上。为了了解注意力的功能,使用CTC来计算鉴于注意力输出的标记后期。我们发现,源目标的注意力头能够在当前的几个令牌上预测几个令牌。受到观察的启发,提出了一种新的正则化方法,该方法利用CTC使源目标的注意力更加集中于与解码器预测的输出令牌相对应的帧。实验表明,与ted-lium 2和librispeech上提议的正则化相对稳定的改善高达7 \%和13 \%。
This paper introduces a novel method to diagnose the source-target attention in state-of-the-art end-to-end speech recognition models with joint connectionist temporal classification (CTC) and attention training. Our method is based on the fact that both, CTC and source-target attention, are acting on the same encoder representations. To understand the functionality of the attention, CTC is applied to compute the token posteriors given the attention outputs. We found that the source-target attention heads are able to predict several tokens ahead of the current one. Inspired by the observation, a new regularization method is proposed which leverages CTC to make source-target attention more focused on the frames corresponding to the output token being predicted by the decoder. Experiments reveal stable improvements up to 7\% and 13\% relatively with the proposed regularization on TED-LIUM 2 and LibriSpeech.