论文标题
部分可观测时空混沌系统的无模型预测
Progressive Multi-Scale Self-Supervised Learning for Speech Recognition
论文作者
论文摘要
自我监督的学习(SSL)模型在自动语音识别(ASR)方面取得了很大改进。此外,如果该模型专门用于理论上学习音频内容信息,则可以进一步提高ASR性能。为此,我们提出了一种渐进的多尺度自学学习(PMS-SSL)方法,该方法使用细粒的目标集来计算顶层的SSL损失,而在中间层上使用粗粒的目标集。此外,PMS-SSL将多尺度结构引入多头自我注意力,以更好地表达语音表示,这将注意区域限制为较大层的较大范围,同时将注意力区域限制为在下层的小范围。关于LibrisPeech数据集的实验表明我们提出的方法的有效性。与Hubert相比,PMS-SSL在10小时 / 100小时子集进行微调时,在测试其他评估子集的相对降低13.7% / 12.7%。
Self-supervised learning (SSL) models have achieved considerable improvements in automatic speech recognition (ASR). In addition, ASR performance could be further improved if the model is dedicated to audio content information learning theoretically. To this end, we propose a progressive multi-scale self-supervised learning (PMS-SSL) method, which uses fine-grained target sets to compute SSL loss at top layer while uses coarse-grained target sets at intermediate layers. Furthermore, PMS-SSL introduces multi-scale structure into multi-head self-attention for better speech representation, which restricts the attention area into a large scope at higher layers while restricts the attention area into a small scope at lower layers. Experiments on Librispeech dataset indicate the effectiveness of our proposed method. Compared with HuBERT, PMS-SSL achieves 13.7% / 12.7% relative WER reduction on test other evaluation subsets respectively when fine-tuned on 10hours / 100hours subsets.