论文标题
Cuchild:大规模的粤语儿童语音语音语音和发音评估
CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment
论文作者
论文摘要
本文描述了Cuchild的设计和开发,Cuchild是一个大规模的粤语演讲语料库。该语料库包含从3至6岁的1,986名儿童演讲者收集的口语。语音材料包括1到4个音节的130个单词。演讲者涵盖了通常发育的(TD)儿童和言语障碍的儿童。该语料库的预期用途是支持科学和临床研究,以及与儿童语音评估有关的技术发展。详细描述了语料库的设计,包括选择单词,参与者招聘,数据获取过程和数据预处理的设计。给出了声学分析的结果,以说明儿童语音的特性。还讨论了语料库在自动语音识别,语音错误检测和说话者诊断中的潜在应用。
This paper describes the design and development of CUCHILD, a large-scale Cantonese corpus of child speech. The corpus contains spoken words collected from 1,986 child speakers aged from 3 to 6 years old. The speech materials include 130 words of 1 to 4 syllables in length. The speakers cover both typically developing (TD) children and children with speech disorder. The intended use of the corpus is to support scientific and clinical research, as well as technology development related to child speech assessment. The design of the corpus, including selection of words, participants recruitment, data acquisition process, and data pre-processing are described in detail. The results of acoustical analysis are presented to illustrate the properties of child speech. Potential applications of the corpus in automatic speech recognition, phonological error detection and speaker diarization are also discussed.