论文标题

Bertweet:英语推文的预先训练的语言模型

BERTweet: A pre-trained language model for English Tweets

论文作者

Nguyen, Dat Quoc, Vu, Thanh, Nguyen, Anh Tuan

论文摘要

我们介绍了Bertweet,这是第一个公开的英语推文的大规模预训练语言模型。我们的Bertweet具有与Bert-base相同的架构(Devlin等,2019),使用Roberta训练程序进行了训练(Liu等,2019)。实验表明,Bertweet的表现优于强大的基线Roberta-base和XLM-R-Base(Conneau等,2020),与以前在三个推文NLP任务上的先前最先进的模型相比,产生更好的性能结果:主语音标记,命名 - 命名性识别和文本分类。我们根据MIT许可发布Bertweet,以促进未来的研究和推文数据应用。我们的bertweet可从https://github.com/vinairesearch/bertweet获得

We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet, having the same architecture as BERT-base (Devlin et al., 2019), is trained using the RoBERTa pre-training procedure (Liu et al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al., 2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks: Part-of-speech tagging, Named-entity recognition and text classification. We release BERTweet under the MIT License to facilitate future research and applications on Tweet data. Our BERTweet is available at https://github.com/VinAIResearch/BERTweet

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源