对齐的重量正规化器用于修剪预验证的神经网络

论文标题

对齐的重量正规化器用于修剪预验证的神经网络

Aligned Weight Regularizers for Pruning Pretrained Neural Networks

论文作者

Neill, James O', Dutta, Sourav, Assem, Haytham

论文摘要

尽管已经探索了迭代修剪的各种研究途径，但鲜为人知修剪对零射击测试性能的影响及其对修剪标准选择的潜在影响。这种修剪设置对于跨语言模型尤其重要，这些模型在训练过程中隐含地学习语言表示之间的一致性，如果通过修剪扭曲，不仅会导致用于再培训的语言数据的性能较差，而且会导致评估的零摄像语言。在这项工作中，我们表明，将标准监督学习与零射击设置进行比较时，基于大小的修剪存在明显的性能差异。从这一发现中，我们提出了两个重量正规化器，旨在最大化修剪和未经修剪的网络单位之间的对齐方式，以减轻修剪的跨语言模型中的比对变形，并在非零弹药和零击设置方面表现良好。我们使用XLM-ROBERTA $ _ {\ MATHRM {base}} $为零射击设置的跨语性任务提供实验结果，在此，我们还发现，根据与零摄影测试集的零相通用的语言，修剪具有不同程度的代表性降级。这也是第一个专注于跨语言语言模型压缩的研究。

While various avenues of research have been explored for iterative pruning, little is known what effect pruning has on zero-shot test performance and its potential implications on the choice of pruning criteria. This pruning setup is particularly important for cross-lingual models that implicitly learn alignment between language representations during pretraining, which if distorted via pruning, not only leads to poorer performance on language data used for retraining but also on zero-shot languages that are evaluated. In this work, we show that there is a clear performance discrepancy in magnitude-based pruning when comparing standard supervised learning to the zero-shot setting. From this finding, we propose two weight regularizers that aim to maximize the alignment between units of pruned and unpruned networks to mitigate alignment distortion in pruned cross-lingual models and perform well for both non zero-shot and zero-shot settings. We provide experimental results on cross-lingual tasks for the zero-shot setting using XLM-RoBERTa$_{\mathrm{Base}}$, where we also find that pruning has varying degrees of representational degradation depending on the language corresponding to the zero-shot test set. This is also the first study that focuses on cross-lingual language model compression.

下载PDF全文

下载文献需遵守相关版权规定

论文标题