论文标题
CNN在第一层中学到了什么,为什么?线性系统透视
What do CNNs Learn in the First Layer and Why? A Linear Systems Perspective
论文作者
论文摘要
以前有报道说,在深度卷积神经网络(CNN)的第一层中所学的表示形式在初始化和体系结构之间高度一致。在这项工作中,我们通过将第一层视为过滤器库并测量其能量分布来量化这种一致性。我们发现能量分布与初始权重的能量分布非常不同,并且在随机初始化,数据集,体系结构中,甚至在CNN经过随机标签训练时也非常一致。为了解释这种一致性,我们得出了一个线性CNN的能量曲线的分析公式,并表明该轮廓主要取决于训练集中图像贴片的二阶统计数据,当迭代次数变为无限时,它将接近美白转换。最后,我们表明,这种线性CNN的公式还为通过常用的非线性CNN(例如Resnet和vgg)学习的能量曲线非常适合,并且这些CNN的第一层确实对其输入进行了近似的美白。
It has previously been reported that the representation that is learned in the first layer of deep Convolutional Neural Networks (CNNs) is highly consistent across initializations and architectures. In this work, we quantify this consistency by considering the first layer as a filter bank and measuring its energy distribution. We find that the energy distribution is very different from that of the initial weights and is remarkably consistent across random initializations, datasets, architectures and even when the CNNs are trained with random labels. In order to explain this consistency, we derive an analytical formula for the energy profile of linear CNNs and show that this profile is mostly dictated by the second order statistics of image patches in the training set and it will approach a whitening transformation when the number of iterations goes to infinity. Finally, we show that this formula for linear CNNs also gives an excellent fit for the energy profiles learned by commonly used nonlinear CNNs such as ResNet and VGG, and that the first layer of these CNNs indeed perform approximate whitening of their inputs.