论文标题

并非总是关于您的:在开发濒危语言技术时优先考虑社区需求

Not always about you: Prioritizing community needs when developing endangered language technology

论文作者

Liu, Zoey, Richardson, Crystal, Hatcher Jr, Richard, Prud'hommeaux, Emily

论文摘要

当语言缺乏培训统计和机器学习工具和模型所需的数据数量时,它们被归类为低资源。资源稀缺的原因有所不同,但可能包括无法访问用于开发这些资源的技术,相对较少的说话者人群,或者缺乏在第二语言是高资源的双语人群中收集此类资源的紧迫性。结果,文献中描述为低资源的语言一方面与芬兰人一样不同,数以百万计的扬声器在每个可想象的领域和塞内卡(Seneca)中使用它,只有一小撮流利的扬声器,主要使用限制性域中的语言。虽然由于缺乏训练模型所需的资源而引起的问题,但在广泛说话的低资源语言和濒危语言之间跨越鸿沟的许多其他问题。在该职位上,我们讨论了研究人员和土著语音社区成员共同努力开发语言技术以支持濒危语言文献和振兴的独特技术,文化,实践和道德挑战。我们报告了来自土著社区的语言教师,主演讲者和长老的观点,以及学者的观点。我们描述了持续不断的合作,并为学术研究人员与语言社区利益相关者之间的未来合作伙伴关系提出建议。

Languages are classified as low-resource when they lack the quantity of data necessary for training statistical and machine learning tools and models. Causes of resource scarcity vary but can include poor access to technology for developing these resources, a relatively small population of speakers, or a lack of urgency for collecting such resources in bilingual populations where the second language is high-resource. As a result, the languages described as low-resource in the literature are as different as Finnish on the one hand, with millions of speakers using it in every imaginable domain, and Seneca, with only a small-handful of fluent speakers using the language primarily in a restricted domain. While issues stemming from the lack of resources necessary to train models unite this disparate group of languages, many other issues cut across the divide between widely-spoken low resource languages and endangered languages. In this position paper, we discuss the unique technological, cultural, practical, and ethical challenges that researchers and indigenous speech community members face when working together to develop language technology to support endangered language documentation and revitalization. We report the perspectives of language teachers, Master Speakers and elders from indigenous communities, as well as the point of view of academics. We describe an ongoing fruitful collaboration and make recommendations for future partnerships between academic researchers and language community stakeholders.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源