论文标题

LADABERT:通过混合模型压缩对BERT的轻巧改编

LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression

论文作者

Mao, Yihuan, Wang, Yujing, Wu, Chufan, Zhang, Chen, Wang, Yang, Yang, Yaming, Zhang, Quanlu, Tong, Yunhai, Bai, Jing

论文摘要

伯特(Bert)是由大型语料库预先培训的尖端语言表示模型,该模型在各种自然语言理解任务上取得了出色的表现。但是,将BERT应用于在线服务的一个主要阻碍问题是,它是内存密集型的,并且导致用户请求的延迟不令人满意,从而增加了模型压缩的必要性。现有的解决方案利用知识蒸馏框架来学习模仿伯特行为的较小模型。但是,知识蒸馏的培训程序本身很昂贵,因为它需要足够的培训数据来模仿教师模型。在本文中,我们通过提出一种名为LaDabert的混合解决方案(通过混合模型压缩对BERT轻巧适应)来解决此问题,该解决方案结合了不同模型压缩方法的优势,包括重量修剪,基质分解和知识蒸馏。 Ladabert可以在各种公共数据集上实现最先进的准确性,而训练开销可以通过数量级降低。

BERT is a cutting-edge language representation model pre-trained by a large corpus, which achieves superior performances on various natural language understanding tasks. However, a major blocking issue of applying BERT to online services is that it is memory-intensive and leads to unsatisfactory latency of user requests, raising the necessity of model compression. Existing solutions leverage the knowledge distillation framework to learn a smaller model that imitates the behaviors of BERT. However, the training procedure of knowledge distillation is expensive itself as it requires sufficient training data to imitate the teacher model. In this paper, we address this issue by proposing a hybrid solution named LadaBERT (Lightweight adaptation of BERT through hybrid model compression), which combines the advantages of different model compression methods, including weight pruning, matrix factorization and knowledge distillation. LadaBERT achieves state-of-the-art accuracy on various public datasets while the training overheads can be reduced by an order of magnitude.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源