论文标题
苏丹方言的端到端自动语音识别模型
End-to-End Automatic Speech Recognition model for the Sudanese Dialect
论文作者
论文摘要
设计自然的语音界面主要依赖于语音识别人类与现代数字生活设备之间的相互作用。此外,语音识别范围缩小了单语个人之间的差距,以更好地交流沟通。但是,该领域缺乏对几种通用语言及其方言的广泛支持,而大多数日常对话都是使用它们进行的。本文开始检查为苏丹方言设计自动语音识别模型的生存能力,苏丹方言是阿拉伯语方言之一,其复杂性是其扬声器独有的历史和社会条件的产物。该条件反映在方言的形式和内容中,因此本文概述了苏丹方言以及收集代表资源的任务和对构建适度数据集的预处理的任务,以克服缺乏带注释的数据。还提出了末端语音识别模型,该模型的设计是使用卷积神经网络形成的。苏丹方言数据集将是一个垫脚石,以实现针对方言的未来自然语言处理研究。设计的模型为当前识别任务提供了一些见解,并达到了平均标签错误率73.67%。
Designing a natural voice interface rely mostly on Speech recognition for interaction between human and their modern digital life equipment. In addition, speech recognition narrows the gap between monolingual individuals to better exchange communication. However, the field lacks wide support for several universal languages and their dialects, while most of the daily conversations are carried out using them. This paper comes to inspect the viability of designing an Automatic Speech Recognition model for the Sudanese dialect, which is one of the Arabic Language dialects, and its complexity is a product of historical and social conditions unique to its speakers. This condition is reflected in both the form and content of the dialect, so this paper gives an overview of the Sudanese dialect and the tasks of collecting represented resources and pre-processing performed to construct a modest dataset to overcome the lack of annotated data. Also proposed end- to-end speech recognition model, the design of the model was formed using Convolution Neural Networks. The Sudanese dialect dataset would be a stepping stone to enable future Natural Language Processing research targeting the dialect. The designed model provided some insights into the current recognition task and reached an average Label Error Rate of 73.67%.