论文标题

将多语言bert扩展到低资源语言

Extending Multilingual BERT to Low-Resource Languages

论文作者

Wang, Zihan, K, Karthikeyan, Mayhew, Stephen, Roth, Dan

论文摘要

多语言伯特(M-bert)在监督和零击的跨语性转移学习方面取得了巨大的成功。但是,这种成功仅集中在维基百科的前104种语言上。在本文中,我们提出了一种简单但有效的方法来扩展M-Bert(E-Bert),以便它可以使任何新语言受益,并表明我们的方法也有益于已经在M-Bert中的语言。我们对27种语言进行了命名实体识别(NER)进行广泛的实验,其中只有16种在M-Bert中,并且在M-Bert中已经在M-Bert中的语言中平均增加了约6%的F1,而新语言的F1则增加了23%。

Multilingual BERT (M-BERT) has been a huge success in both supervised and zero-shot cross-lingual transfer learning. However, this success has focused only on the top 104 languages in Wikipedia that it was trained on. In this paper, we propose a simple but effective approach to extend M-BERT (E-BERT) so that it can benefit any new language, and show that our approach benefits languages that are already in M-BERT as well. We perform an extensive set of experiments with Named Entity Recognition (NER) on 27 languages, only 16 of which are in M-BERT, and show an average increase of about 6% F1 on languages that are already in M-BERT and 23% F1 increase on new languages.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源