rnn-transducer具有端到端的语言偏见 - 英语 - 英语代码转换语音识别

论文标题

rnn-transducer具有端到端的语言偏见 - 英语 - 英语代码转换语音识别

Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition

论文作者

Zhang, Shuai, Yi, Jiangyan, Tian, Zhengkun, Tao, Jianhua, Bai, Ye

论文摘要

最近，语言身份信息已被用来提高端到端代码转换（CS）语音识别的性能。但是，以前的作品将其他语言标识（LID）模型用作辅助模块，从而导致系统复杂。在这项工作中，我们提出了一种具有语言偏见的改进的复发性神经网络传感器（RNN-T）模型，以减轻问题。我们使用语言身份来偏向模型以预测CS点。这促进了直接从转录中学习语言身份信息的模型，并且不需要其他盖子模型。我们通过普通话 - 英语CS语料库接缝来评估该方法。与我们的RNN-T基线相比，所提出的方法可以分别在两个测试组上实现16.2％和12.9％的相对误差降低。

Recently, language identity information has been utilized to improve the performance of end-to-end code-switching (CS) speech recognition. However, previous works use an additional language identification (LID) model as an auxiliary module, which causes the system complex. In this work, we propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem. We use the language identities to bias the model to predict the CS points. This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed. We evaluate the approach on a Mandarin-English CS corpus SEAME. Compared to our RNN-T baseline, the proposed method can achieve 16.2% and 12.9% relative error reduction on two test sets, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题