论文标题
训练一个曾经是一个全面的网络时是否存在干扰?
Does Interference Exist When Training a Once-For-All Network?
论文作者
论文摘要
曾经使用的(OFA)方法通过利用SuperNet-Subnet架构,为将受过训练的神经网络模型部署到多个目标平台中提供了出色的途径。经过训练后,可以从超级网(架构和训练的重量)派生子网,并直接部署到目标平台上,几乎没有再进行一次或微调。为了训练子网种群,OFA使用一种新型的培训方法,称为渐进式收缩(PS),旨在限制训练过程中干扰的负面影响。据认为,训练期间较高的干扰会导致子网较低的人口精度。在这项工作中,我们仔细研究了这种干扰效应。令人惊讶的是,我们发现干扰缓解策略对整个子网人口绩效没有很大影响。取而代之的是,我们发现在培训期间的子网结构选择偏差是一个更重要的方面。为了证明这一点,我们提出了一种称为随机子网采样(RSS)的简单效率有效的方法,该方法对干扰效应没有缓解。尽管没有缓解措施,但在四个小型至中等大小的数据集中,RSS能够产生比PS更好的性能子网种群。表明干扰效应在这些数据集中不起作用。由于其简单性,RSS与PS相比,培训时间减少了$ 1.9 \ times $。当RSS训练时期减少时,$ 6.1 \ times $减少的性能也可以合理地下降。代码可在https://github.com/jordan-hs/rss-interference-cvprw2022中找到。
The Once-For-All (OFA) method offers an excellent pathway to deploy a trained neural network model into multiple target platforms by utilising the supernet-subnet architecture. Once trained, a subnet can be derived from the supernet (both architecture and trained weights) and deployed directly to the target platform with little to no retraining or fine-tuning. To train the subnet population, OFA uses a novel training method called Progressive Shrinking (PS) which is designed to limit the negative impact of interference during training. It is believed that higher interference during training results in lower subnet population accuracies. In this work we take a second look at this interference effect. Surprisingly, we find that interference mitigation strategies do not have a large impact on the overall subnet population performance. Instead, we find the subnet architecture selection bias during training to be a more important aspect. To show this, we propose a simple-yet-effective method called Random Subnet Sampling (RSS), which does not have mitigation on the interference effect. Despite no mitigation, RSS is able to produce a better performing subnet population than PS in four small-to-medium-sized datasets; suggesting that the interference effect does not play a pivotal role in these datasets. Due to its simplicity, RSS provides a $1.9\times$ reduction in training times compared to PS. A $6.1\times$ reduction can also be achieved with a reasonable drop in performance when the number of RSS training epochs are reduced. Code available at https://github.com/Jordan-HS/RSS-Interference-CVPRW2022.