bloom+1：为零射击提示添加语言支持

论文标题

bloom+1：为零射击提示添加语言支持

BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting

论文作者

Yong, Zheng-Xin, Schoelkopf, Hailey, Muennighoff, Niklas, Aji, Alham Fikri, Adelani, David Ifeoluwa, Almubarak, Khalid, Bari, M Saiful, Sutawika, Lintang, Kasai, Jungo, Baruwa, Ahmed, Winata, Genta Indra, Biderman, Stella, Raff, Edward, Radev, Dragomir, Nikoulina, Vassilina

论文摘要

Bloom模型是一种大型公开多语言语言模型，但其预处理仅限于46种语言。为了将Bloom的好处扩展到其他语言，而不会产生高昂的成本，因此希望将Bloom适应在训练训练期间看不到的新语言。在这项工作中，我们将现有的语言适应策略应用于将其零拍打和基准在资源约束的设置中促使其八种新语言的性能。我们发现语言适应能够有效地改善新语言的零摄像性能。令人惊讶的是，我们发现基于适配器的芬太尼比大型模型的持续预处理更有效。此外，我们发现提示性能不会受到语言细节（例如写作系统）的显着影响。它主要取决于语言适应数据的大小。我们还为Bloomz添加了新语言，Bloomz是Bloom的多任务列式列式版本，能够遵循任务说明零射。我们发现在多任务微调混合物中包括一种新语言是教Bloomz一种新语言的最有效方法。我们得出的结论是，借助足够的培训数据，适应性可以很好地推广到各种语言。我们的代码可在https://github.com/bigscience-workshop/multlingual-modeling上找到。

The BLOOM model is a large publicly available multilingual language model, but its pretraining was limited to 46 languages. To extend the benefits of BLOOM to other languages without incurring prohibitively large costs, it is desirable to adapt BLOOM to new languages not seen during pretraining. In this work, we apply existing language adaptation strategies to BLOOM and benchmark its zero-shot prompting performance on eight new languages in a resource-constrained setting. We find language adaptation to be effective at improving zero-shot performance in new languages. Surprisingly, we find that adapter-based finetuning is more effective than continued pretraining for large models. In addition, we discover that prompting performance is not significantly affected by language specifics, such as the writing system. It is primarily determined by the size of the language adaptation data. We also add new languages to BLOOMZ, which is a multitask finetuned version of BLOOM capable of following task instructions zero-shot. We find including a new language in the multitask fine-tuning mixture to be the most effective method to teach BLOOMZ a new language. We conclude that with sufficient training data language adaptation can generalize well to diverse languages. Our code is available at https://github.com/bigscience-workshop/multilingual-modeling.

下载PDF全文

下载文献需遵守相关版权规定

论文标题