论文标题
在浅层模型中,黑森的分析表征:对称性的故事
Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry
论文作者
论文摘要
我们考虑与拟合两层relu网络相关的优化问题,相对于平方损耗,该标签是由目标网络生成的。我们利用丰富的对称结构在分析中表征了在自然政权中的各个伪造型最低段的黑森族,其中投入$ D $和隐藏神经元的数量是有限的。特别是,我们证明,对于$ d \ ge k $标准高斯输入:(a)Hessian的$ dk $特征值,$ dk -o(d)$浓缩物在零附近,(b)$ω(d)eigenvalues $ eigenvalues $ linearly与$ k $线性地生长。尽管据我们所知,但以前已经观察到了多次偏斜频谱的这种现象,但这是第一次建立{严格}。我们的分析方法利用了从对称性的破坏和表示理论来使用新的领域技术,并对我们通过局部曲率争论统计概括的能力具有重要意义。
We consider the optimization problem associated with fitting two-layers ReLU networks with respect to the squared loss, where labels are generated by a target network. We leverage the rich symmetry structure to analytically characterize the Hessian at various families of spurious minima in the natural regime where the number of inputs $d$ and the number of hidden neurons $k$ is finite. In particular, we prove that for $d\ge k$ standard Gaussian inputs: (a) of the $dk$ eigenvalues of the Hessian, $dk - O(d)$ concentrate near zero, (b) $Ω(d)$ of the eigenvalues grow linearly with $k$. Although this phenomenon of extremely skewed spectrum has been observed many times before, to our knowledge, this is the first time it has been established {rigorously}. Our analytic approach uses techniques, new to the field, from symmetry breaking and representation theory, and carries important implications for our ability to argue about statistical generalization through local curvature.