为ASR评估中的话语建模依赖性结构

论文标题

为ASR评估中的话语建模依赖性结构

Modeling Dependent Structure for Utterances in ASR Evaluation

论文作者

Liu, Zhe, Peng, Fuchun

论文摘要

Bootstrap重新采样方法在自动语音识别（ASR）评估中对单词错误率（WER）进行显着性分析非常受欢迎。为了处理依赖的语音数据，还引入了Blockwise Bootstrap方法。通过将话语划分为不相关的块，此方法对这些块而不是原始数据进行重新示例。但是，发现话语之间的依赖结构并确定块通常是不平凡的，这可能会导致统计测试中的主观结论。在本文中，我们介绍了基于图形的套索方法，以严格的方式明确地对这种依赖性和估计话语的估计块进行建模，然后将块引导程序应用于推断的块的顶部。我们显示在轻度条件下，在ASR评估中的WER的差异估计值在统计上是一致的。我们还证明了在Librispeech数据集上提出的方法的有效性。

The bootstrap resampling method has been popular for performing significance analysis on word error rate (WER) in automatic speech recognition (ASR) evaluation. To deal with dependent speech data, the blockwise bootstrap approach is also introduced. By dividing utterances into uncorrelated blocks, this approach resamples these blocks instead of original data. However, it is typically nontrivial to uncover the dependent structure among utterances and identify the blocks, which might lead to subjective conclusions in statistical testing. In this paper, we present graphical lasso based methods to explicitly model such dependency and estimate uncorrelated blocks of utterances in a rigorous way, after which blockwise bootstrap is applied on top of the inferred blocks. We show the resulting variance estimator of WER in ASR evaluation is statistically consistent under mild conditions. We also demonstrate the validity of proposed approach on LibriSpeech dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题