论文标题

普通话中文的定位超级词的语料库

A Corpus of Adpositional Supersenses for Mandarin Chinese

论文作者

Peng, Siyao, Liu, Yang, Zhu, Yilun, Blodgett, Austin, Zhao, Yushi, Schneider, Nathan

论文摘要

定位是语义关系的常见标记,但它们的含糊不清,并且在语言之间差异很大。此外,还缺乏注释的语料库来调查定位语义的跨语言变化或建立多语言上的歧义系统。本文介绍了一个语料库,其中所有词句都用普通话中文进行了语义注释;据我们所知,这是第一个用Adposition语义进行广泛注释的中国语料库。我们的方法改编了一个框架,该框架根据表面上独立于语言的语义标准定义了一系列超级词,尽管其发展主要集中在英语介词上(Schneider等,2018)。我们发现,尽管句法与英语有差异,但Supersense类别非常适合中国的适应性。根据小王子的普通话翻译,我们达到了高通道的一致性,并分析了bitext中的定位令牌的语义对应。

Adpositions are frequent markers of semantic relations, but they are highly ambiguous and vary significantly from language to language. Moreover, there is a dearth of annotated corpora for investigating the cross-linguistic variation of adposition semantics, or for building multilingual disambiguation systems. This paper presents a corpus in which all adpositions have been semantically annotated in Mandarin Chinese; to the best of our knowledge, this is the first Chinese corpus to be broadly annotated with adposition semantics. Our approach adapts a framework that defined a general set of supersenses according to ostensibly language-independent semantic criteria, though its development focused primarily on English prepositions (Schneider et al., 2018). We find that the supersense categories are well-suited to Chinese adpositions despite syntactic differences from English. On a Mandarin translation of The Little Prince, we achieve high inter-annotator agreement and analyze semantic correspondences of adposition tokens in bitext.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源