论文标题
Igbo-English Machine Translation:评估基准测试
Igbo-English Machine Translation: An Evaluation Benchmark
论文作者
论文摘要
尽管研究人员和从业人员正在推动界限,并增强了NLP工具和方法的能力,但使用非洲语言的作品仍在落后。大量关注诸如英语,日语,德语,法语,俄语,普通话等资源丰富的语言。超过97%的世界7000种语言,包括非洲语言,对于NLP的资源很低,即NLP研究的数据,工具和技术很少,工具和技术。例如,在2965年中,只有5个,从2018年ACL,NAACL,EMNLP,Coling和Conll提取的ACL选集中全文论文的0.19%作者隶属于非洲机构。在这项工作中,我们讨论了为Igbo(尼日利亚三种主要语言之一)构建标准机器翻译基准数据集的努力。全球超过500万人在尼日利亚东南部,全球有超过50%的发言人讲话。伊博(Igbo)的资源很低,尽管在开发伊金尔(Igbonlp)(例如语音标记和大声迹法恢复的一部分)方面已经做出了一些努力
Although researchers and practitioners are pushing the boundaries and enhancing the capacities of NLP tools and methods, works on African languages are lagging. A lot of focus on well resourced languages such as English, Japanese, German, French, Russian, Mandarin Chinese etc. Over 97% of the world's 7000 languages, including African languages, are low resourced for NLP i.e. they have little or no data, tools, and techniques for NLP research. For instance, only 5 out of 2965, 0.19% authors of full text papers in the ACL Anthology extracted from the 5 major conferences in 2018 ACL, NAACL, EMNLP, COLING and CoNLL, are affiliated to African institutions. In this work, we discuss our effort toward building a standard machine translation benchmark dataset for Igbo, one of the 3 major Nigerian languages. Igbo is spoken by more than 50 million people globally with over 50% of the speakers are in southeastern Nigeria. Igbo is low resourced although there have been some efforts toward developing IgboNLP such as part of speech tagging and diacritic restoration