自动语音识别中重要性图的大规模评估

论文标题

自动语音识别中重要性图的大规模评估

Large scale evaluation of importance maps in automatic speech recognition

论文作者

Trinh, Viet Anh, Mandel, Michael I

论文摘要

在本文中，我们提出了一个指标，我们称之为结构化显着性基准（SSBM），以评估计算出针对单个话语的自动语音识别器计算的重要性图。这些地图表示话语的时频点，这对于正确识别目标词最重要。我们的评估技术不仅适用于标准分类任务，而且还适用于序列到序列模型等结构化预测任务。此外，我们使用这种方法对我们先前引入的技术使用“气泡噪声”创建的重要性图进行了大规模比较，以通过与基于平滑的语音能量和强迫对齐的基线方法相关性来识别重要点。我们的结果表明，与AMI语料库的100个句子相比，与该基线相比，泡沫分析方法在识别重要的语音区域方面更好。

In this paper, we propose a metric that we call the structured saliency benchmark (SSBM) to evaluate importance maps computed for automatic speech recognizers on individual utterances. These maps indicate time-frequency points of the utterance that are most important for correct recognition of a target word. Our evaluation technique is not only suitable for standard classification tasks, but is also appropriate for structured prediction tasks like sequence-to-sequence models. Additionally, we use this approach to perform a large scale comparison of the importance maps created by our previously introduced technique using "bubble noise" to identify important points through correlation with a baseline approach based on smoothed speech energy and forced alignment. Our results show that the bubble analysis approach is better at identifying important speech regions than this baseline on 100 sentences from the AMI corpus.

下载PDF全文

下载文献需遵守相关版权规定

论文标题