论文标题
任何算法的隐式偏差:通过边缘界限偏差
Implicit bias of any algorithm: bounding bias via margin
论文作者
论文摘要
考虑$ n $点$ x_1,\ ldots,x_n $在有限维欧几里得空间中,每种都有两种颜色之一。假设存在一个分离的超平面(用其单位正常矢量$ w)$ $,即超平面,以使相同颜色的点位于超平面的同一侧。我们通过其余量$γ(W)$来测量这种超平面的质量,该$γ$定义为任何点$ x_i $和超平面之间的最小距离。在本文中,我们证明了保证金函数$γ$满足不平等库尔迪卡 - 洛贾西维奇的不平等,而指数$ 1/2 $。该结果具有深远的后果。例如,令$γ^{opt} $为问题的最大边距,让$ w^{opt} $为达到此值的超平面参数。给定任何其他与参数$ w $的分离超平面,让$ d(w):= \ | w-w^{opt} \ | $是$ w $和$ w^{opt} $之间的欧几里得距离,也称为$ w $的偏见。从以前的KL-敏化中,我们推断出$(γ^{opt}-γ(w)) / r \ le d \ le d(w)\ le 2 \ sqrt {(γ^{opt}-γ(w)) /γ^{opt}}} $,其中$ r:= \ max_i \ from $ remant in $ ins $ s $ s $ in $ rement $ restime $是因此,对于任何优化算法(无论是否梯度散发),迭代的偏置至少收敛于其边缘收敛速率的平方根。因此,我们的工作提供了一种通用工具,用于分析任何算法的隐性偏差,在其边际范围内,在可能无法提供专门分析的情况下:足以确定汇聚范围的良好速度,这一任务通常要容易得多。
Consider $n$ points $x_1,\ldots,x_n$ in finite-dimensional euclidean space, each having one of two colors. Suppose there exists a separating hyperplane (identified with its unit normal vector $w)$ for the points, i.e a hyperplane such that points of same color lie on the same side of the hyperplane. We measure the quality of such a hyperplane by its margin $γ(w)$, defined as minimum distance between any of the points $x_i$ and the hyperplane. In this paper, we prove that the margin function $γ$ satisfies a nonsmooth Kurdyka-Lojasiewicz inequality with exponent $1/2$. This result has far-reaching consequences. For example, let $γ^{opt}$ be the maximum possible margin for the problem and let $w^{opt}$ be the parameter for the hyperplane which attains this value. Given any other separating hyperplane with parameter $w$, let $d(w):=\|w-w^{opt}\|$ be the euclidean distance between $w$ and $w^{opt}$, also called the bias of $w$. From the previous KL-inequality, we deduce that $(γ^{opt}-γ(w)) / R \le d(w) \le 2\sqrt{(γ^{opt}-γ(w))/γ^{opt}}$, where $R:=\max_i \|x_i\|$ is the maximum distance of the points $x_i$ from the origin. Consequently, for any optimization algorithm (gradient-descent or not), the bias of the iterates converges at least as fast as the square-root of the rate of their convergence of the margin. Thus, our work provides a generic tool for analyzing the implicit bias of any algorithm in terms of its margin, in situations where a specialized analysis might not be available: it is sufficient to establish a good rate for converge of the margin, a task which is usually much easier.