运动修剪：通过微调自适应稀疏性

论文标题

运动修剪：通过微调自适应稀疏性

Movement Pruning: Adaptive Sparsity by Fine-Tuning

论文作者

Sanh, Victor, Wolf, Thomas, Rush, Alexander M.

论文摘要

幅度修剪是一种广泛使用的策略，用于减少纯监督学习中的模型大小。但是，它在转移学习制度中的有效性较小，该系统已成为最先进的自然语言处理应用程序的标准。我们建议使用运动修剪，这是一种简单，确定性的一阶重量修剪方法，更适合预验证的模型微调。我们将数学基础与该方法进行比较，并将其与现有的零零和一阶修剪方法进行比较。实验表明，在修剪大型语言模型时，运动修剪会显示出高优势制度的显着改善。当结合蒸馏结合使用时，该方法可实现最小的精度损失，而仅降至模型参数的3％。

Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning. We give mathematical foundations to the method and compare it to existing zeroth- and first-order pruning methods. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes. When combined with distillation, the approach achieves minimal accuracy loss with down to only 3% of the model parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题