论文标题
用于分析网络修剪的梯度流框架
A Gradient Flow Framework For Analyzing Network Pruning
论文作者
论文摘要
最近的网络修剪方法着重于培训早期修剪模型。为了估计删除参数的影响,这些方法使用最初设计用于修剪训练模型的重要性度量。尽管缺乏对训练提前使用的理由,但此类措施导致精度损失令人惊讶。为了更好地解释这种行为,我们开发了一个通用框架,该框架使用梯度流来通过模型参数的规范统一最先进的重要性度量。我们使用该框架来确定修剪度量与模型参数的演变之间的关系,建立了与培训早期修剪模型有关的几个结果:(i)基于幅度的耕作消除了降低损失的参数,导致模型的收敛速度比幅度级快,而不是高度敏感的方法; (ii)基于损失的修剪可保留一阶模型演化动力学,因此适用于修剪最小训练的模型; (iii)基于梯度 - 基准的修剪会影响二阶模型演化动力学,因此通过修剪增加梯度标准可以产生性能较差的模型。我们验证了对在CIFAR-10/CIFAR-100进行培训的几个VGG-13,Mobilenet-V1和Resnet-56型号的主张。可在https://github.com/ekdeepslubana/flowandprune上找到代码。
Recent network pruning methods focus on pruning models early-on in training. To estimate the impact of removing a parameter, these methods use importance measures that were originally designed to prune trained models. Despite lacking justification for their use early-on in training, such measures result in surprisingly low accuracy loss. To better explain this behavior, we develop a general framework that uses gradient flow to unify state-of-the-art importance measures through the norm of model parameters. We use this framework to determine the relationship between pruning measures and evolution of model parameters, establishing several results related to pruning models early-on in training: (i) magnitude-based pruning removes parameters that contribute least to reduction in loss, resulting in models that converge faster than magnitude-agnostic methods; (ii) loss-preservation based pruning preserves first-order model evolution dynamics and is therefore appropriate for pruning minimally trained models; and (iii) gradient-norm based pruning affects second-order model evolution dynamics, such that increasing gradient norm via pruning can produce poorly performing models. We validate our claims on several VGG-13, MobileNet-V1, and ResNet-56 models trained on CIFAR-10/CIFAR-100. Code available at https://github.com/EkdeepSLubana/flowandprune.