论文标题
LANFRICA:记录有关非洲语言的机器翻译研究的参与式方法
Lanfrica: A Participatory Approach to Documenting Machine Translation Research on African Languages
论文作者
论文摘要
多年来,尤其是在不断增长的机器翻译研究中包括非洲语言的运动,通常是自然语言处理(NLP)。非洲的语言多样性最高,有1500-2000种有记录的语言和更多的无证件或灭绝的语言(Lewis,2009; Bendor-Samuel,2017)。这使得很难跟踪为其中一些开发的MT研究,模型和数据集。由于互联网和社交媒体构成了世界一半以上的日常生活(林,2020年)以及40%的非洲人(Campbell,2019年),在线平台可以在这些非洲语言中为研究,基准和数据集创造可访问性,从而改善了现有研究及其结果及其结果。在本文中,我们介绍了Lanfrica,这是一个小说,正在进行的框架,该框架采用参与式方法来记录非洲语言的研究,项目,基准和数据集。
Over the years, there have been campaigns to include the African languages in the growing research on machine translation (MT) in particular, and natural language processing (NLP) in general. Africa has the highest language diversity, with 1500-2000 documented languages and many more undocumented or extinct languages(Lewis, 2009; Bendor-Samuel, 2017). This makes it hard to keep track of the MT research, models and dataset that have been developed for some of them. As the internet and social media make up the daily lives of more than half of the world(Lin, 2020), as well as over 40% of Africans(Campbell, 2019), online platforms can be useful in creating accessibility to researches, benchmarks and datasets in these African languages, thereby improving reproducibility and sharing of existing research and their results. In this paper, we introduce Lanfrica, a novel, on-going framework that employs a participatory approach to documenting researches, projects, benchmarks and dataset on African languages.