论文标题

运动修剪:通过微调自适应稀疏性

Movement Pruning: Adaptive Sparsity by Fine-Tuning

论文作者

Sanh, Victor, Wolf, Thomas, Rush, Alexander M.

论文摘要

幅度修剪是一种广泛使用的策略,用于减少纯监督学习中的模型大小。但是,它在转移学习制度中的有效性较小,该系统已成为最先进的自然语言处理应用程序的标准。我们建议使用运动修剪,这是一种简单,确定性的一阶重量修剪方法,更适合预验证的模型微调。我们将数学基础与该方法进行比较,并将其与现有的零零和一阶修剪方法进行比较。实验表明,在修剪大型语言模型时,运动修剪会显示出高优势制度的显着改善。当结合蒸馏结合使用时,该方法可实现最小的精度损失,而仅降至模型参数的3%。

Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning. We give mathematical foundations to the method and compare it to existing zeroth- and first-order pruning methods. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes. When combined with distillation, the approach achieves minimal accuracy loss with down to only 3% of the model parameters.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源