最小二乘随机梯度流的隐式正规化

论文标题

最小二乘随机梯度流的隐式正规化

The Implicit Regularization of Stochastic Gradient Flow for Least Squares

论文作者

Ali, Alnur, Dobriban, Edgar, Tibshirani, Ryan J.

论文摘要

当应用于最小二乘回归的基本问题时，我们研究了迷你批次随机梯度下降的隐式正则化。我们利用与随机梯度下降相同的连续时间随机微分方程，我们称之为随机梯度流。我们对随机流动流动流的多余风险在时间$ t $上给出了限制，而ridge回归$λ= 1/t $。可以从显式常数（例如，小批量的大小，步骤尺寸，迭代次数）中计算结合，从而准确揭示了这些数量如何驱动多余的风险。数值示例表明，界限可能很小，表明两个估计器之间存在紧密的关系。我们给出了相似的结果，该结果将随机梯度流和脊的系数相关。这些结果在数据矩阵$ x $的条件下以及整个优化路径（不仅在收敛中）保持不变。

We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time $t$, over ridge regression with tuning parameter $λ= 1/t$. The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix $X$, and across the entire optimization path (not just at convergence).

下载PDF全文

下载文献需遵守相关版权规定

论文标题