论文标题
开箱即用的频道修剪网络
Out-of-the-box channel pruned networks
论文作者
论文摘要
在过去的十年中,卷积神经网络已成为Gargantuan。预先训练的模型用作初始化器时能够在小数据集上微调更大的网络。因此,并非这些微调模型检测到的所有卷积特征都是终端任务所必需的。已经提出了一些渠道修剪的作品,以修剪已经训练的模型中的计算和记忆。通常,这些涉及策略,这些策略分别从各层中删除哪些通道,分别从通道和/或层的修剪轮廓上删除。在本文中,我们进行了几项基线实验,并确定从随机通道修剪策略中的概况与基于度量的策略一样好。我们还确定,从某些层次的修剪策略中可能存在比公共基准更好的概况。然后,我们证明,使用一个数据集的详尽随机搜索发现的顶层修剪轮廓也是其他数据集的顶级配置文件之一。这意味着我们可以使用基准数据集识别开箱即用的层修剪配置文件,并将其直接用于新数据集。此外,我们开发了基于策略的搜索算法的加固学习(RL),其直接目的是使用许多模型使用许多模型来查找可转移的层修剪图谱。我们使用一种新颖的奖励公式,将这种RL搜索推向预期的压缩,同时最大程度地提高准确性。我们的结果表明,我们通过详尽搜索在原始数据集中找到的基于RL的传输配置文件是好或更好。然后,我们证明,如果我们使用中型数据集(例如CIFAR10/100)找到了配置文件,我们就可以将它们传输到诸如Imagenet之类的大数据集中。
In the last decade convolutional neural networks have become gargantuan. Pre-trained models, when used as initializers are able to fine-tune ever larger networks on small datasets. Consequently, not all the convolutional features that these fine-tuned models detect are requisite for the end-task. Several works of channel pruning have been proposed to prune away compute and memory from models that were trained already. Typically, these involve policies that decide which and how many channels to remove from each layer leading to channel-wise and/or layer-wise pruning profiles, respectively. In this paper, we conduct several baseline experiments and establish that profiles from random channel-wise pruning policies are as good as metric-based ones. We also establish that there may exist profiles from some layer-wise pruning policies that are measurably better than common baselines. We then demonstrate that the top layer-wise pruning profiles found using an exhaustive random search from one datatset are also among the top profiles for other datasets. This implies that we could identify out-of-the-box layer-wise pruning profiles using benchmark datasets and use these directly for new datasets. Furthermore, we develop a Reinforcement Learning (RL) policy-based search algorithm with a direct objective of finding transferable layer-wise pruning profiles using many models for the same architecture. We use a novel reward formulation that drives this RL search towards an expected compression while maximizing accuracy. Our results show that our transferred RL-based profiles are as good or better than best profiles found on the original dataset via exhaustive search. We then demonstrate that if we found the profiles using a mid-sized dataset such as Cifar10/100, we are able to transfer them to even a large dataset such as Imagenet.