在单声道语音增强中，以自动识别使用混合训练的真实嘈杂语音的自动识别

论文标题

在单声道语音增强中，以自动识别使用混合训练的真实嘈杂语音的自动识别

On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training

论文作者

Zhang, Jisi, Zorila, Catalin, Doddipatla, Rama, Barker, Jon

论文摘要

在本文中，我们探索了一个改进的框架，以训练单腔神经增强模型，以识别强大的语音识别。设计的训练框架扩展了现有的混合训练标准，以利用未配对的干净语音和真实的嘈杂数据。发现未配对的干净言语对于提高实际嘈杂言论的分离语音质量至关重要。该提出的方法还对处理和未加工的信号进行混合，以减轻处理工件。单渠道Chime-3真实测试集的实验表明，在语音识别性能方面，对在不匹配的模拟数据中训练的增强系统的语音识别性能以有监督的方式或以无培养方式匹配的真实数据进行了显着改善。与未经处理的信号相比，该系统实现了相对减少的16％至39％，使用端到端和混合声学模型，而没有对扭曲的数据进行重新培训。

In this paper, we explore an improved framework to train a monoaural neural enhancement model for robust speech recognition. The designed training framework extends the existing mixture invariant training criterion to exploit both unpaired clean speech and real noisy data. It is found that the unpaired clean speech is crucial to improve quality of separated speech from real noisy speech. The proposed method also performs remixing of processed and unprocessed signals to alleviate the processing artifacts. Experiments on the single-channel CHiME-3 real test sets show that the proposed method improves significantly in terms of speech recognition performance over the enhancement system trained either on the mismatched simulated data in a supervised fashion or on the matched real data in an unsupervised fashion. Between 16% and 39% relative WER reduction has been achieved by the proposed system compared to the unprocessed signal using end-to-end and hybrid acoustic models without retraining on distorted data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题