滑石粉：开源普通话 - 英语密码开关语料库和语音识别基线

论文标题

滑石粉：开源普通话 - 英语密码开关语料库和语音识别基线

TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline

论文作者

Li, Chengfei, Deng, Shuhao, Wang, Yaoping, Wang, Guangjing, Gong, Yaguang, Chen, Changbin, Bai, Jinfeng

论文摘要

本文介绍了一种新的普通话 - 英语代码转换语音识别的语料库 - talcs语料库，适用于培训和评估代码转换语音识别系统。滑石乐语源自TAL教育小组中真正的在线一对一英语教学场景，该小组包含大约587个小时的语音在16 kHz时采样。据我们所知，滑石科目是世界上标签最大的普通话 - 英语代码开关开源自动语音识别（ASR）数据集。在本文中，我们将详细介绍录制过程，包括捕获设备和语料库环境的音频。并且按照允许许可证1，可以免费下载滑石场。我们使用滑石乐谱，在两个流行的语音识别工具包中进行ASR实验，以制造包括ESPNET和WENET在内的基线系统。在滑石粉中比较了两个语音识别工具包中的混合错误率（MER）性能。实验结果表明，音频录制和转录的质量是有希望的，基线系统是可行的。

This paper introduces a new corpus of Mandarin-English code-switching speech recognition--TALCS corpus, suitable for training and evaluating code-switching speech recognition systems. TALCS corpus is derived from real online one-to-one English teaching scenes in TAL education group, which contains roughly 587 hours of speech sampled at 16 kHz. To our best knowledge, TALCS corpus is the largest well labeled Mandarin-English code-switching open source automatic speech recognition (ASR) dataset in the world. In this paper, we will introduce the recording procedure in detail, including audio capturing devices and corpus environments. And the TALCS corpus is freely available for download under the permissive license1. Using TALCS corpus, we conduct ASR experiments in two popular speech recognition toolkits to make a baseline system, including ESPnet and Wenet. The Mixture Error Rate (MER) performance in the two speech recognition toolkits is compared in TALCS corpus. The experimental results implies that the quality of audio recordings and transcriptions are promising and the baseline system is workable.

下载PDF全文

下载文献需遵守相关版权规定

论文标题