朝着端到端的无监督语音识别

论文标题

朝着端到端的无监督语音识别

Towards End-to-end Unsupervised Speech Recognition

论文作者

Liu, Alexander H., Hsu, Wei-Ning, Auli, Michael, Baevski, Alexei

论文摘要

无监督的语音识别表现出了使每种语言都可以访问的自动语音识别（ASR）系统的巨大潜力。但是，现有方法仍然严重依赖手工制作的预处理。与端到端进行监督语音识别的趋势类似，我们介绍了WAV2VEC-U 2.0，它消除了所有音频端的预处理，并通过更好的体系结构提高了准确性。此外，我们引入了一个辅助自我监督的目标，该目标将模型预测与输入相关联。实验表明，WAV2VEC-U 2.0在概念上更简单的同时，可以改善不同语言的无监督识别结果。

Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR) systems accessible to every language. However, existing methods still heavily rely on hand-crafted pre-processing. Similar to the trend of making supervised speech recognition end-to-end, we introduce wav2vec-U 2.0 which does away with all audio-side pre-processing and improves accuracy through better architecture. In addition, we introduce an auxiliary self-supervised objective that ties model predictions back to the input. Experiments show that wav2vec-U 2.0 improves unsupervised recognition results across different languages while being conceptually simpler.

下载PDF全文

下载文献需遵守相关版权规定

论文标题