论文标题
主成分网络:培训早期的参数减少
Principal Component Networks: Parameter Reduction Early in Training
论文作者
论文摘要
最近的作品表明,过度参数化的网络包含小的子网络,在隔离训练时表现出与完整模型相当的精度。这些结果突出了减少深神经网络培训成本而不牺牲概括性能的潜力。但是,现有的找到这些小型网络的方法依赖于昂贵的多轮火车和捕捉程序,并且对于大型数据集和模型而言是非实践的。在本文中,我们展示了如何找到与仅几个训练时期后的过度参数化相同性能的小型网络。我们发现,过度参数化网络中的隐藏层激活主要存在于小于实际模型宽度的子空间中。在此观察结果的基础上,我们使用PCA来找到层输入的高方差基础,并使用这些方向表示层权重。我们消除了与找到的PCA基础无关的所有权重,并将这些网络体系结构的主组件网络称为。在CIFAR-10和IMAGENET上,我们表明PCNS训练更快,并且使用过度参数化的模型更快,而没有准确的损失。我们发现我们的转换导致网络的参数少23.8倍,具有相等或更高端模型精度的网络 - 在某些情况下,我们观察到改善高达3%。我们还表明,Resnet-20 PCN在训练速度更快时优于深层RESNET-110网络。
Recent works show that overparameterized networks contain small subnetworks that exhibit comparable accuracy to the full model when trained in isolation. These results highlight the potential to reduce training costs of deep neural networks without sacrificing generalization performance. However, existing approaches for finding these small networks rely on expensive multi-round train-and-prune procedures and are non-practical for large data sets and models. In this paper, we show how to find small networks that exhibit the same performance as their overparameterized counterparts after only a few training epochs. We find that hidden layer activations in overparameterized networks exist primarily in subspaces smaller than the actual model width. Building on this observation, we use PCA to find a basis of high variance for layer inputs and represent layer weights using these directions. We eliminate all weights not relevant to the found PCA basis and term these network architectures Principal Component Networks. On CIFAR-10 and ImageNet, we show that PCNs train faster and use less energy than overparameterized models, without accuracy loss. We find that our transformation leads to networks with up to 23.8x fewer parameters, with equal or higher end-model accuracy---in some cases we observe improvements up to 3%. We also show that ResNet-20 PCNs outperform deep ResNet-110 networks while training faster.