论文标题

重新疗法:基于端到端语音应用的数据记录,预处理和语音转录

RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications

论文作者

Stan, Adriana

论文摘要

深度学习可以开发有效的端到端语音处理应用程序,同时绕过对专家语言和信号处理功能的需求。然而,最近的研究表明,培训数据的高质量语音资源和语音转录可以增强这些应用程序的结果。在本文中,引入了重新疗法工具。重新疗法简化了基于端到端语音的应用程序所需的数据记录和预处理的步骤。该工具实现了易于使用的界面,用于提示语音记录,频谱图和波形分析,话语级别的归一化和沉默修剪,以及八种语言中提示的素式转换:捷克,英语,法语,法语,德语,意大利语,意大利语,波兰语,罗马尼亚语和西班牙语。 字符至phoneme(G2P)转换器是基于深神经网络(DNN)的架构,该建筑对从Wiktionary在线协作资源提取的词典进行了训练。随着拼字透明度的不同程度,以及各种语言的不同语音条目,DNN的超参数通过进化策略进行了优化。呈现和讨论所得G2P转换器的音素和单词错误率。该工具,处理后的语音词典和经过训练的G2P模型可自由使用。

Deep learning enables the development of efficient end-to-end speech processing applications while bypassing the need for expert linguistic and signal processing features. Yet, recent studies show that good quality speech resources and phonetic transcription of the training data can enhance the results of these applications. In this paper, the RECOApy tool is introduced. RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications. The tool implements an easy-to-use interface for prompted speech recording, spectrogram and waveform analysis, utterance-level normalisation and silence trimming, as well grapheme-to-phoneme conversion of the prompts in eight languages: Czech, English, French, German, Italian, Polish, Romanian and Spanish. The grapheme-to-phoneme (G2P) converters are deep neural network (DNN) based architectures trained on lexicons extracted from the Wiktionary online collaborative resource. With the different degree of orthographic transparency, as well as the varying amount of phonetic entries across the languages, the DNN's hyperparameters are optimised with an evolution strategy. The phoneme and word error rates of the resulting G2P converters are presented and discussed. The tool, the processed phonetic lexicons and trained G2P models are made freely available.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源