Bertweet：英语推文的预先训练的语言模型

论文标题

Bertweet：英语推文的预先训练的语言模型

BERTweet: A pre-trained language model for English Tweets

论文作者

Nguyen, Dat Quoc, Vu, Thanh, Nguyen, Anh Tuan

论文摘要

我们介绍了Bertweet，这是第一个公开的英语推文的大规模预训练语言模型。我们的Bertweet具有与Bert-base相同的架构（Devlin等，2019），使用Roberta训练程序进行了训练（Liu等，2019）。实验表明，Bertweet的表现优于强大的基线Roberta-base和XLM-R-Base（Conneau等，2020），与以前在三个推文NLP任务上的先前最先进的模型相比，产生更好的性能结果：主语音标记，命名 - 命名性识别和文本分类。我们根据MIT许可发布Bertweet，以促进未来的研究和推文数据应用。我们的bertweet可从https://github.com/vinairesearch/bertweet获得

We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet, having the same architecture as BERT-base (Devlin et al., 2019), is trained using the RoBERTa pre-training procedure (Liu et al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al., 2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks: Part-of-speech tagging, Named-entity recognition and text classification. We release BERTweet under the MIT License to facilitate future research and applications on Tweet data. Our BERTweet is available at https://github.com/VinAIResearch/BERTweet

下载PDF全文

下载文献需遵守相关版权规定

论文标题