支持向量和单神经元网络的梯度动力学

论文标题

支持向量和单神经元网络的梯度动力学

Support Vectors and Gradient Dynamics of Single-Neuron ReLU Networks

论文作者

Lee, Sangmin, Sim, Byeongsu, Ye, Jong Chul

论文摘要

了解梯度下降对Relu网络的概括能力的隐性偏见一直是机器学习研究中的重要研究主题。不幸的是，即使对于接受过正方形损失训练的单个Relu神经元，最近也表明，不可能根据模型参数的规范来表征隐式正则化（Vardi＆Shamir，2021）。为了缩小理解Relu网络有趣的概括行为的差距，在训练单神经元网络时，我们在这里检查参数空间中的梯度流动动力学。具体来说，我们发现了在支持向量方面的隐含偏见，该偏见在Relu网络良好地概括的原因和方式上起着关键作用。此外，我们分析了梯度流相对于初始化规范的幅度，并表明学习量的重量的规范严格通过梯度流量增加。最后，我们证明了单个Relu神经元的全球融合，以$ d = 2 $ case。

Understanding implicit bias of gradient descent for generalization capability of ReLU networks has been an important research topic in machine learning research. Unfortunately, even for a single ReLU neuron trained with the square loss, it was recently shown impossible to characterize the implicit regularization in terms of a norm of model parameters (Vardi & Shamir, 2021). In order to close the gap toward understanding intriguing generalization behavior of ReLU networks, here we examine the gradient flow dynamics in the parameter space when training single-neuron ReLU networks. Specifically, we discover an implicit bias in terms of support vectors, which plays a key role in why and how ReLU networks generalize well. Moreover, we analyze gradient flows with respect to the magnitude of the norm of initialization, and show that the norm of the learned weight strictly increases through the gradient flow. Lastly, we prove the global convergence of single ReLU neuron for $d = 2$ case.

下载PDF全文

下载文献需遵守相关版权规定

论文标题