Fednst：联合嘈杂的学生培训自动语音识别

论文标题

Fednst：联合嘈杂的学生培训自动语音识别

FedNST: Federated Noisy Student Training for Automatic Speech Recognition

论文作者

Mehmood, Haaris, Dobrowolska, Agnieszka, Saravanan, Karthikeyan, Ozay, Mete

论文摘要

联合学习（FL）启用了分布式系统中用户设备（客户端）上的最新自动语音识别（ASR）模型，从而阻止将原始用户数据传输到中央服务器。 ASR实用采用实践采用面临的主要挑战是在客户身上获得地面真相标签。现有的方法依靠客户手动抄录演讲，这对于获得大型培训语料库是不切实际的。一个有希望的替代方法是使用半/自我监督的学习方法来利用未标记的用户数据。为此，我们提出了Fednst，这是一种使用私人和未标记的用户数据训练分布式ASR模型的新颖方法。我们探索了Fednst的各个方面，例如具有不同比例的标记和未标记数据的培训模型，并评估1173个模拟客户端的拟议方法。在LibrisPeech上评估Fednst，其中960个小时的语音数据被平均分为服务器（标记）和客户端（未标记）数据，显示了仅对服务器数据训练的监督基线，相对单词错误率降低}（WERR）22.5％。

Federated Learning (FL) enables training state-of-the-art Automatic Speech Recognition (ASR) models on user devices (clients) in distributed systems, hence preventing transmission of raw user data to a central server. A key challenge facing practical adoption of FL for ASR is obtaining ground-truth labels on the clients. Existing approaches rely on clients to manually transcribe their speech, which is impractical for obtaining large training corpora. A promising alternative is using semi-/self-supervised learning approaches to leverage unlabelled user data. To this end, we propose FedNST, a novel method for training distributed ASR models using private and unlabelled user data. We explore various facets of FedNST, such as training models with different proportions of labelled and unlabelled data, and evaluate the proposed approach on 1173 simulated clients. Evaluating FedNST on LibriSpeech, where 960 hours of speech data is split equally into server (labelled) and client (unlabelled) data, showed a 22.5% relative word error rate reduction} (WERR) over a supervised baseline trained only on server data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题