论文标题

山脊回归,过度参数化的两层网络会收敛到山脊光谱

Ridge Regression with Over-Parametrized Two-Layer Networks Converge to Ridgelet Spectrum

论文作者

Sonoda, Sho, Ishikawa, Isao, Ikeda, Masahiro

论文摘要

局部最小值的特征在深度学习的理论研究中引起了很多关注。在这项研究中,我们研究了由Ridge正规化经验平方风险最小化(RERM)训练的过度参数有限神经网络中参数的分布。我们开发了一种新的Ridgelet变换理论,这是一种小波式的积分变换,为不仅涉及RELU而且一般激活功能的神经网络的理论研究提供了一个强大而通用的框架。我们表明,参数的分布会收敛到Ridgelet变换的光谱。该结果为神经网络的局部最小值的表征以及基于懒惰制度的归纳偏见理论的理论背景提供了新的见解。我们确认了由SGD训练的参数分布与通过数值集成通过使用有限模型的数值实验计算的Ridgelet Spectrum之间的视觉相似之处。

Characterization of local minima draws much attention in theoretical studies of deep learning. In this study, we investigate the distribution of parameters in an over-parametrized finite neural network trained by ridge regularized empirical square risk minimization (RERM). We develop a new theory of ridgelet transform, a wavelet-like integral transform that provides a powerful and general framework for the theoretical study of neural networks involving not only the ReLU but general activation functions. We show that the distribution of the parameters converges to a spectrum of the ridgelet transform. This result provides a new insight into the characterization of the local minima of neural networks, and the theoretical background of an inductive bias theory based on lazy regimes. We confirm the visual resemblance between the parameter distribution trained by SGD, and the ridgelet spectrum calculated by numerical integration through numerical experiments with finite models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源