阿拉伯：基于变压器的阿拉伯语理解模型

论文标题

阿拉伯：基于变压器的阿拉伯语理解模型

AraBERT: Transformer-based Model for Arabic Language Understanding

论文作者

Antoun, Wissam, Baly, Fady, Hajj, Hazem

论文摘要

阿拉伯语是一种形态上丰富的语言，与英语相比，资源相对较少，语法较少。鉴于这些局限性，阿拉伯语自然语言处理（NLP）任务（例如情感分析（SA），命名实体识别（NER）和问题答案（QA））已被证明非常具有挑战性。最近，随着基于变形金刚的模型激增，基于语言的基于BERT的模型已被证明在语言理解方面非常有效，只要它们在非常大的语料库中进行了预训练。这样的模型能够为大多数NLP任务设置新标准并实现最新结果。在本文中，我们专门针对阿拉伯语的伯特（Bert）追求伯特（Bert）对英语所取得的成功。将Arabert的性能与Google和其他最先进方法的多语言BERT进行了比较。结果表明，新开发的阿拉伯特在大多数经过测试的阿拉伯NLP任务上实现了最先进的表现。预验证的阿拉伯特模型可在https://github.com/aub-mind/arabert上公开可用，希望鼓励阿拉伯语NLP的研究和应用。

The Arabic language is a morphologically rich language with relatively few resources and a less explored syntax compared to English. Given these limitations, Arabic Natural Language Processing (NLP) tasks like Sentiment Analysis (SA), Named Entity Recognition (NER), and Question Answering (QA), have proven to be very challenging to tackle. Recently, with the surge of transformers based models, language-specific BERT based models have proven to be very efficient at language understanding, provided they are pre-trained on a very large corpus. Such models were able to set new standards and achieve state-of-the-art results for most NLP tasks. In this paper, we pre-trained BERT specifically for the Arabic language in the pursuit of achieving the same success that BERT did for the English language. The performance of AraBERT is compared to multilingual BERT from Google and other state-of-the-art approaches. The results showed that the newly developed AraBERT achieved state-of-the-art performance on most tested Arabic NLP tasks. The pretrained araBERT models are publicly available on https://github.com/aub-mind/arabert hoping to encourage research and applications for Arabic NLP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题