神经网络培训的早期阶段

论文标题

神经网络培训的早期阶段

The Early Phase of Neural Network Training

论文作者

Frankle, Jonathan, Schwab, David J., Morcos, Ari S.

论文摘要

最近的研究表明，神经网络学习的许多重要方面发生在最早的迭代或培训时期。例如，稀疏，可训练的子网络出现（Frankle等，2019），梯度下降进入一个小的子空间（Gur-Ari等，2018），并且网络经历了关键时期（Achille等，2019）。在这里，我们研究了在培训的早期阶段，深层神经网络发生的变化。在培训的这些早期迭代中，我们对网络状态进行了广泛的测量，并利用了Frankle等人的框架。（2019年）定量探测重量分布及其对数据集各个方面的依赖。我们发现，在此框架内，深网不起作用，可以在保持符号的同时重新定位随机权重，即使仅几百个迭代后，权重分布也是高度非独立的。尽管采用这种行为，但使用模糊的输入或辅助自我监督任务进行预训练可以近似监督网络的变化，这表明这些变化并非固有地依赖标签依赖性，尽管标签显着加速了这一过程。共同，这些结果有助于阐明在学习的关键初始学习期间发生的网络变化。

Recent studies have shown that many important aspects of neural network learning take place within the very earliest iterations or epochs of training. For example, sparse, trainable sub-networks emerge (Frankle et al., 2019), gradient descent moves into a small subspace (Gur-Ari et al., 2018), and the network undergoes a critical period (Achille et al., 2019). Here, we examine the changes that deep neural networks undergo during this early phase of training. We perform extensive measurements of the network state during these early iterations of training and leverage the framework of Frankle et al. (2019) to quantitatively probe the weight distribution and its reliance on various aspects of the dataset. We find that, within this framework, deep networks are not robust to reinitializing with random weights while maintaining signs, and that weight distributions are highly non-independent even after only a few hundred iterations. Despite this behavior, pre-training with blurred inputs or an auxiliary self-supervised task can approximate the changes in supervised networks, suggesting that these changes are not inherently label-dependent, though labels significantly accelerate this process. Together, these results help to elucidate the network changes occurring during this pivotal initial period of learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题