论文标题

学习语调模式嵌入阿拉伯语方言识别

Learning Intonation Pattern Embeddings for Arabic Dialect Identification

论文作者

Alvarez, Aitor Arronte, Issa, Elsayed Sabry Abdelaal

论文摘要

本文使用语调模式和声学表示,介绍了阿拉伯方言识别(ADI)的完整端到端管道。语言和方言识别的最新方法使用语言意识的深度体系结构能够捕获语言和方言之间的语音差异。具体而言,在ADI任务中,语言特征和声学表征的不同组合已经成功地使用了深度学习模型。本文介绍的方法使用语调模式以及混合残留和双向LSTM网络来学习声学嵌入,而没有其他语言信息。实验的结果表明,阿拉伯语方言的语调模式提供了足够的信息,可以在Vardial 17 ADI数据集上实现最新结果,表现优于单功能系统。与其他需要大量数据的深度学习方法相比,提出的管道对数据稀疏性是可靠的。我们猜想了足够信息作为在深度学习ADI任务中最佳的标准的重要性,更普遍地将其应用于声学建模问题。小型语调模式在信息理论意义上足够的时候,可以使深度学习体系结构学习更准确的语音表示。

This article presents a full end-to-end pipeline for Arabic Dialect Identification (ADI) using intonation patterns and acoustic representations. Recent approaches to language and dialect identification use linguistic-aware deep architectures that are able to capture phonetic differences amongst languages and dialects. Specifically, in ADI tasks, different combinations of linguistic features and acoustic representations have been successful with deep learning models. The approach presented in this article uses intonation patterns and hybrid residual and bidirectional LSTM networks to learn acoustic embeddings with no additional linguistic information. Results of the experiments show that intonation patterns for Arabic dialects provide sufficient information to achieve state-of-the-art results on the VarDial 17 ADI dataset, outperforming single-feature systems. The pipeline presented is robust to data sparsity, in contrast to other deep learning approaches that require large quantities of data. We conjecture on the importance of sufficient information as a criterion for optimality in a deep learning ADI task, and more generally, its application to acoustic modeling problems. Small intonation patterns, when sufficient in an information-theoretic sense, allow deep learning architectures to learn more accurate speech representations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源