论文标题

i3rab:基于阿拉伯语法理论的新的阿拉伯依赖性树库

I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical Theory

论文作者

Halabi, Dana, Fayyoumi, Ebaa, Awajan, Arafat

论文摘要

Treebanks是有价值的语言资源,除了POS-TAG和形态学特征外,还包括语言句子的句法结构。它们主要用于建模统计解析器。尽管对于英语等语言,统计自然语言解析器最近变得更加准确,但阿拉伯语的语言仍然具有较低的准确性。本文的目的是基于传统的阿拉伯语法理论和阿拉伯语的特征来构建一种新的阿拉伯依赖性树库,以研究它们对统计解析器准确性的影响。所提出的阿拉伯依赖树库(称为i3rab)与现有的两个主要概念中的阿拉伯依赖性树库形成鲜明对比。第一个概念是确定句子的主词的方法,第二个概念是加入和秘密代词的表示。为了评估i3rab,我们将其性能与布拉格阿拉伯依赖性树库的一部分,该库共享了可比的细节水平。进行的实验表明,UAS的百分比最高为7.5%,而LAS的百分比为18.8%。

Treebanks are valuable linguistic resources that include the syntactic structure of a language sentence in addition to POS-tags and morphological features. They are mainly utilized in modeling statistical parsers. Although the statistical natural language parser has recently become more accurate for languages such as English, those for the Arabic language still have low accuracy. The purpose of this paper is to construct a new Arabic dependency treebank based on the traditional Arabic grammatical theory and the characteristics of the Arabic language, to investigate their effects on the accuracy of statistical parsers. The proposed Arabic dependency treebank, called I3rab, contrasts with existing Arabic dependency treebanks in two main concepts. The first concept is the approach of determining the main word of the sentence, and the second concept is the representation of the joined and covert pronouns. To evaluate I3rab, we compared its performance against a subset of Prague Arabic Dependency Treebank that shares a comparable level of details. The conducted experiments show that the percentage improvement reached up to 7.5% in UAS and 18.8% in LAS.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源