论文标题

使用语言类型学丰富多语言词典:亲属关系中的词汇差距

Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship

论文作者

Khishigsuren, Temuulen, Bella, Gábor, Batsuren, Khuyagbaatar, Freihat, Abed Alhakim, Nair, Nandu Chandran, Ganbold, Amarsanaa, Khalilia, Hadi, Chandrashekar, Yamini, Giunchiglia, Fausto

论文摘要

本文介绍了一种基于词汇类型学领域的知识,以与语言多样性有关的内容丰富词汇资源的方法。我们通过词汇差距和特定于语言的单词的概念捕获了多样性的现象,并使用系统的方法在大规模上半自动地推断差距。作为亲属术语领域(已知在世界范围内非常多样化的领域)获得的第一个结果,我们发布了词典 - 语义资源,该资源由198个领域概念,1,911个单词和37,370个差距组成,涵盖了699种语言。我们看到了使用资源(例如我们的资源)来改进各种跨语义NLP任务的潜力,我们通过下游应用程序来评估机器翻译系统。

This paper describes a method to enrich lexical resources with content relating to linguistic diversity, based on knowledge from the field of lexical typology. We capture the phenomenon of diversity through the notions of lexical gap and language-specific word and use a systematic method to infer gaps semi-automatically on a large scale. As a first result obtained for the domain of kinship terminology, known to be very diverse throughout the world, we publish a lexico-semantic resource consisting of 198 domain concepts, 1,911 words, and 37,370 gaps covering 699 languages. We see potential in the use of resources such as ours for the improvement of a variety of cross-lingual NLP tasks, which we demonstrate through a downstream application for the evaluation of machine translation systems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源