封闭的经常性背景：在线编码器decoder语音识别的无软智能注意力

论文标题

封闭的经常性背景：在线编码器decoder语音识别的无软智能注意力

Gated Recurrent Context: Softmax-free Attention for Online Encoder-Decoder Speech Recognition

论文作者

Lee, Hyeonseung, Kang, Woo Hyun, Cheon, Sung Jun, Kim, Hyeongju, Kim, Nam Soo

论文摘要

最近，基于注意力的编码器模型（AED）模型显示了自动语音识别（ASR）的最先进性能。由于具有全球关注的原始AED模型无法在线推断，因此已经制定了各种在线关注方案，以减少ASR潜伏期以获得更好的用户体验。但是，传统的基于软马克斯的在线注意方法的常见局限性是，它们引入了与注意力窗口长度相关的其他超参数，需要对调整超参数进行多次模型培训进行多次试验。为了解决这个问题，我们提出了一种新颖的无软关注方法及其修改的在线注意力，该方法在训练阶段不需要任何其他超参数。通过许多ASR实验，我们可以通过仅在测试阶段调整阈值来控制所提出的在线注意力技术的延迟和性能之间的权衡。此外，提出的方法在单词误差（WERS）方面对传统的全球和在线注意力表现出了竞争性绩效。

Recently, attention-based encoder-decoder (AED) models have shown state-of-the-art performance in automatic speech recognition (ASR). As the original AED models with global attentions are not capable of online inference, various online attention schemes have been developed to reduce ASR latency for better user experience. However, a common limitation of the conventional softmax-based online attention approaches is that they introduce an additional hyperparameter related to the length of the attention window, requiring multiple trials of model training for tuning the hyperparameter. In order to deal with this problem, we propose a novel softmax-free attention method and its modified formulation for online attention, which does not need any additional hyperparameter at the training phase. Through a number of ASR experiments, we demonstrate the tradeoff between the latency and performance of the proposed online attention technique can be controlled by merely adjusting a threshold at the test phase. Furthermore, the proposed methods showed competitive performance to the conventional global and online attentions in terms of word-error-rates (WERs).

下载PDF全文

下载文献需遵守相关版权规定

论文标题