在无限宽度极限处的两层恢复神经网络的相图

论文标题

在无限宽度极限处的两层恢复神经网络的相图

Phase diagram for two-layer ReLU neural networks at infinite-width limit

论文作者

Luo, Tao, Xu, Zhi-Qin John, Ma, Zheng, Zhang, Yaoyu

论文摘要

神经网络在训练过程中如何进行超参数选择是神经网络研究中的一个重要问题。在这项工作中，受统计力学中的相图的启发，我们在无限宽度限制的两层relu神经网络中绘制了其动力学制度的完整表征及其对与初始化相关的超级参数的依赖性的相位图。通过实验和理论方法，我们基于输入权重的相对变化，在宽度接近无穷大的情况下，我们确定了相图中的三个制度，即线性制度，关键制度和凝结状态，分别趋向于$ 0 $，$ O（$ O（1）$和$+\ \ \ iffty $。在线性状态下，NN训练动力学大致类似于具有指数损耗的随机特征模型。在凝结的制度中，我们通过实验证明了活性神经元以几种离散取向凝结。临界方案是上述两个方案之间的边界，该方案表现出与平均场模型的中间非线性行为，是典型的例子。总体而言，我们的两层Relu NN的相图是未来研究的地图，并且是对训练行为进行更加系统地研究的第一步，以及对不同结构的NNS的隐式正则化。

How neural network behaves during the training over different choices of hyperparameters is an important question in the study of neural networks. In this work, inspired by the phase diagram in statistical mechanics, we draw the phase diagram for the two-layer ReLU neural network at the infinite-width limit for a complete characterization of its dynamical regimes and their dependence on hyperparameters related to initialization. Through both experimental and theoretical approaches, we identify three regimes in the phase diagram, i.e., linear regime, critical regime and condensed regime, based on the relative change of input weights as the width approaches infinity, which tends to $0$, $O(1)$ and $+\infty$, respectively. In the linear regime, NN training dynamics is approximately linear similar to a random feature model with an exponential loss decay. In the condensed regime, we demonstrate through experiments that active neurons are condensed at several discrete orientations. The critical regime serves as the boundary between above two regimes, which exhibits an intermediate nonlinear behavior with the mean-field model as a typical example. Overall, our phase diagram for the two-layer ReLU NN serves as a map for the future studies and is a first step towards a more systematical investigation of the training behavior and the implicit regularization of NNs of different structures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题