论文标题

Igbo-English Machine Translation:评估基准测试

Igbo-English Machine Translation: An Evaluation Benchmark

论文作者

Ezeani, Ignatius, Rayson, Paul, Onyenwe, Ikechukwu, Uchechukwu, Chinedu, Hepple, Mark

论文摘要

尽管研究人员和从业人员正在推动界限,并增强了NLP工具和方法的能力,但使用非洲语言的作品仍在落后。大量关注诸如英语,日语,德语,法语,俄语,普通话等资源丰富的语言。超过97%的世界7000种语言,包括非洲语言,对于NLP的资源很低,即NLP研究的数据,工具和技术很少,工具和技术。例如,在2965年中,只有5个,从2018年ACL,NAACL,EMNLP,Coling和Conll提取的ACL选集中全文论文的0.19%作者隶属于非洲机构。在这项工作中,我们讨论了为Igbo(尼日利亚三种主要语言之一)构建标准机器翻译基准数据集的努力。全球超过500万人在尼日利亚东南部,全球有超过50%的发言人讲话。伊博(Igbo)的资源很低,尽管在开发伊金尔(Igbonlp)(例如语音标记和大声迹法恢复的一部分)方面已经做出了一些努力

Although researchers and practitioners are pushing the boundaries and enhancing the capacities of NLP tools and methods, works on African languages are lagging. A lot of focus on well resourced languages such as English, Japanese, German, French, Russian, Mandarin Chinese etc. Over 97% of the world's 7000 languages, including African languages, are low resourced for NLP i.e. they have little or no data, tools, and techniques for NLP research. For instance, only 5 out of 2965, 0.19% authors of full text papers in the ACL Anthology extracted from the 5 major conferences in 2018 ACL, NAACL, EMNLP, COLING and CoNLL, are affiliated to African institutions. In this work, we discuss our effort toward building a standard machine translation benchmark dataset for Igbo, one of the 3 major Nigerian languages. Igbo is spoken by more than 50 million people globally with over 50% of the speakers are in southeastern Nigeria. Igbo is low resourced although there have been some efforts toward developing IgboNLP such as part of speech tagging and diacritic restoration

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源