论文标题
自动阿拉伯语方言标识系统的书面文本:调查
Automatic Arabic Dialect Identification Systems for Written Texts: A Survey
论文作者
论文摘要
阿拉伯方言识别是自然语言处理的特定任务,旨在自动预测给定文本的阿拉伯方言。阿拉伯方言标识是各种自然语言处理应用中的第一步,例如机器翻译,多语言文本到语音综合和跨语言文本生成。因此,在过去的十年中,解决阿拉伯语方言识别问题的兴趣增加了。在本文中,我们对书面文本中阿拉伯语方言识别研究进行了全面调查。我们首先定义问题及其挑战。然后,调查以与阿拉伯方言识别任务相关的许多方面进行了广泛的讨论。因此,我们回顾了传统的机器学习方法,深度学习体系结构和复杂的阿拉伯方言识别方法。我们还详细介绍了用于训练拟议系统的功能表示的功能和技术。此外,我们说明了在文献中研究的阿拉伯方言的分类学,即进行阿拉伯方言识别的各种文本处理(例如,代币,句子和文档级别)以及可用的注释资源,包括评估基准公司。在调查结束时讨论了公开挑战和问题。
Arabic dialect identification is a specific task of natural language processing, aiming to automatically predict the Arabic dialect of a given text. Arabic dialect identification is the first step in various natural language processing applications such as machine translation, multilingual text-to-speech synthesis, and cross-language text generation. Therefore, in the last decade, interest has increased in addressing the problem of Arabic dialect identification. In this paper, we present a comprehensive survey of Arabic dialect identification research in written texts. We first define the problem and its challenges. Then, the survey extensively discusses in a critical manner many aspects related to Arabic dialect identification task. So, we review the traditional machine learning methods, deep learning architectures, and complex learning approaches to Arabic dialect identification. We also detail the features and techniques for feature representations used to train the proposed systems. Moreover, we illustrate the taxonomy of Arabic dialects studied in the literature, the various levels of text processing at which Arabic dialect identification are conducted (e.g., token, sentence, and document level), as well as the available annotated resources, including evaluation benchmark corpora. Open challenges and issues are discussed at the end of the survey.