论文标题
标签噪声(随机)梯度下降隐式求解了二次参数化的套索
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation
论文作者
论文摘要
了解训练算法的隐性偏见对于解释过多散热性神经网络的成功至关重要。在本文中,我们研究了标签噪声通过其连续时间版本的四次参数化模型的训练动力学的作用。我们明确表征由随机流选择的解决方案,并证明它隐含地解决了套索程序。为了完全完成我们的分析,我们为动力学提供了非肌电收敛保证,以及支持恢复的条件。我们还提供了支持我们理论主张的实验结果。我们的发现强调了一个事实,即结构化噪声可以诱导更好的概括,并有助于解释在实践中观察到的随机动力学的更大性能。
Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the role of the label noise in the training dynamics of a quadratically parametrised model through its continuous time version. We explicitly characterise the solution chosen by the stochastic flow and prove that it implicitly solves a Lasso program. To fully complete our analysis, we provide nonasymptotic convergence guarantees for the dynamics as well as conditions for support recovery. We also give experimental results which support our theoretical claims. Our findings highlight the fact that structured noise can induce better generalisation and help explain the greater performances of stochastic dynamics as observed in practice.