ASR的联合蒙版CPC和CTC培训

论文标题

ASR的联合蒙版CPC和CTC培训

Joint Masked CPC and CTC Training for ASR

论文作者

Talnikar, Chaitanya, Likhomanenko, Tatiana, Collobert, Ronan, Synnaeve, Gabriel

论文摘要

自我监督的学习（SSL）在学习声明的学习表示中显示了有望，这对于自动语音识别（ASR）有用。但是，训练SSL模型（例如WAV2VEC〜2.0）需要两阶段的管道。在本文中，我们演示了可以利用未标记和标记数据的ASR模型的单阶段培训。在培训期间，我们交替地最大程度地减少了两种损失：无监督的掩盖对比预测编码（CPC）损失和受监督的音频到文本对齐损失连接暂时分类（CTC）。我们表明，这种联合训练方法使用无监督的数据直接优化了下游ASR任务的性能，同时在LibrisPeech 100小时数据集上达到了与WAV2VEC〜2.0相似的单词错误率。最后，我们假设解决对比任务是监督CTC损失的正规化。

Self-supervised learning (SSL) has shown promise in learning representations of audio that are useful for automatic speech recognition (ASR). But, training SSL models like wav2vec~2.0 requires a two-stage pipeline. In this paper we demonstrate a single-stage training of ASR models that can utilize both unlabeled and labeled data. During training, we alternately minimize two losses: an unsupervised masked Contrastive Predictive Coding (CPC) loss and the supervised audio-to-text alignment loss Connectionist Temporal Classification (CTC). We show that this joint training method directly optimizes performance for the downstream ASR task using unsupervised data while achieving similar word error rates to wav2vec~2.0 on the Librispeech 100-hour dataset. Finally, we postulate that solving the contrastive task is a regularization for the supervised CTC loss.

下载PDF全文

下载文献需遵守相关版权规定

论文标题