Wikibert模型：多种语言的深层转移学习

论文标题

Wikibert模型：多种语言的深层转移学习

WikiBERT models: deep transfer learning for many languages

论文作者

Pyysalo, Sampo, Kanerva, Jenna, Virtanen, Antti, Ginter, Filip

论文摘要

诸如BERT之类的深层神经语言模型使许多自然语言处理任务的最新进展实现了。由于其预训练所涉及的努力和计算成本，通常仅针对少数高资源语言（例如英语）引入语言模型。尽管可以使用涵盖大量语言的多语言模型，但最近的工作表明单语培训可以产生更好的模型，并且我们对单语和多语言培训之间的权衡的理解是不完整的。在本文中，我们引入了一条简单的，全自动的管道，用于从Wikipedia数据中创建特定于语言的BERT模型，并引入42种新模型，大多数用于现在缺乏专用的深神经语言模型的语言。我们使用通用依赖性数据的最先进的解析器评估了这些模型的优点，并使用多语言BERT模型将性能与结果对比。我们发现，使用Wikibert模型的Udify平均使用Mbert优于解析器，而特定于语言的模型显示出某些语言的性能大大提高，但有限的改进或其他人的性能下降。我们还提出了初步结果，作为了解最有益的语言模型下的条件的第一步。这项工作中介绍的所有方法和模型均可在https://github.com/turkunlp/wikibert的开放许可下获得。

Deep neural language models such as BERT have enabled substantial recent advances in many natural language processing tasks. Due to the effort and computational cost involved in their pre-training, language-specific models are typically introduced only for a small number of high-resource languages such as English. While multilingual models covering large numbers of languages are available, recent work suggests monolingual training can produce better models, and our understanding of the tradeoffs between mono- and multilingual training is incomplete. In this paper, we introduce a simple, fully automated pipeline for creating language-specific BERT models from Wikipedia data and introduce 42 new such models, most for languages up to now lacking dedicated deep neural language models. We assess the merits of these models using the state-of-the-art UDify parser on Universal Dependencies data, contrasting performance with results using the multilingual BERT model. We find that UDify using WikiBERT models outperforms the parser using mBERT on average, with the language-specific models showing substantially improved performance for some languages, yet limited improvement or a decrease in performance for others. We also present preliminary results as first steps toward an understanding of the conditions under which language-specific models are most beneficial. All of the methods and models introduced in this work are available under open licenses from https://github.com/turkunlp/wikibert.

下载PDF全文

下载文献需遵守相关版权规定

论文标题