Microbert：通过降低参数和多任务学习对低资源单语言的有效培训

论文标题

Microbert：通过降低参数和多任务学习对低资源单语言的有效培训

MicroBERT: Effective Training of Low-resource Monolingual BERTs through Parameter Reduction and Multitask Learning

论文作者

Gessler, Luke, Zeldes, Amir

论文摘要

变压器语言模型（TLMS）对于大多数NLP任务至关重要，但是由于所需的预处理数据，因此很难为低资源语言创建它们。在这项工作中，我们研究了在低资源环境中培训单语TLM的两种技术：大大降低了TLM的大小，并使用两个语言上丰富的监督任务（一部分语音标记和依赖性解析）补充了蒙面的语言建模目标。 7种不同语言的结果表明，我们的Microbert能够相对于典型的单语TLM预处理方法，在下游任务评估中产生明显的改进。具体而言，我们发现与多语言基线Mbert相比，单语Microbert模型可为Parser LAS获得高达18％的收益，而NER F1的收益率为11％，而NER F1的收益率为11％。我们得出结论，减少了TLM参数计数，并使用标记的数据进行预处理的低资源TLM可以产生巨大的质量好处，在某些情况下，产生了超过多语言方法的模型。

Transformer language models (TLMs) are critical for most NLP tasks, but they are difficult to create for low-resource languages because of how much pretraining data they require. In this work, we investigate two techniques for training monolingual TLMs in a low-resource setting: greatly reducing TLM size, and complementing the masked language modeling objective with two linguistically rich supervised tasks (part-of-speech tagging and dependency parsing). Results from 7 diverse languages indicate that our model, MicroBERT, is able to produce marked improvements in downstream task evaluations relative to a typical monolingual TLM pretraining approach. Specifically, we find that monolingual MicroBERT models achieve gains of up to 18% for parser LAS and 11% for NER F1 compared to a multilingual baseline, mBERT, while having less than 1% of its parameter count. We conclude reducing TLM parameter count and using labeled data for pretraining low-resource TLMs can yield large quality benefits and in some cases produce models that outperform multilingual approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题