任何算法的隐式偏差：通过边缘界限偏差

论文标题

任何算法的隐式偏差：通过边缘界限偏差

Implicit bias of any algorithm: bounding bias via margin

论文作者

Dohmatob, Elvis

论文摘要

考虑$ n $点$ x_1，\ ldots，x_n $在有限维欧几里得空间中，每种都有两种颜色之一。假设存在一个分离的超平面（用其单位正常矢量$ w）$ $，即超平面，以使相同颜色的点位于超平面的同一侧。我们通过其余量$γ（W）$来测量这种超平面的质量，该$γ$定义为任何点$ x_i $和超平面之间的最小距离。在本文中，我们证明了保证金函数$γ$满足不平等库尔迪卡 - 洛贾西维奇的不平等，而指数$ 1/2 $。该结果具有深远的后果。例如，令$γ^{opt} $为问题的最大边距，让$ w^{opt} $为达到此值的超平面参数。给定任何其他与参数$ w $的分离超平面，让$ d（w）：= \ | w-w^{opt} \ | $是$ w $和$ w^{opt} $之间的欧几里得距离，也称为$ w $的偏见。从以前的KL-敏化中，我们推断出$（γ^{opt}-γ（w）） / r \ le d \ le d（w）\ le 2 \ sqrt {（γ^{opt}-γ（w）） /γ^{opt}}} $，其中$ r：= \ max_i \ from $ remant in $ ins $ s $ s $ in $ rement $ restime $是因此，对于任何优化算法（无论是否梯度散发），迭代的偏置至少收敛于其边缘收敛速率的平方根。因此，我们的工作提供了一种通用工具，用于分析任何算法的隐性偏差，在其边际范围内，在可能无法提供专门分析的情况下：足以确定汇聚范围的良好速度，这一任务通常要容易得多。

Consider $n$ points $x_1,\ldots,x_n$ in finite-dimensional euclidean space, each having one of two colors. Suppose there exists a separating hyperplane (identified with its unit normal vector $w)$ for the points, i.e a hyperplane such that points of same color lie on the same side of the hyperplane. We measure the quality of such a hyperplane by its margin $γ(w)$, defined as minimum distance between any of the points $x_i$ and the hyperplane. In this paper, we prove that the margin function $γ$ satisfies a nonsmooth Kurdyka-Lojasiewicz inequality with exponent $1/2$. This result has far-reaching consequences. For example, let $γ^{opt}$ be the maximum possible margin for the problem and let $w^{opt}$ be the parameter for the hyperplane which attains this value. Given any other separating hyperplane with parameter $w$, let $d(w):=\|w-w^{opt}\|$ be the euclidean distance between $w$ and $w^{opt}$, also called the bias of $w$. From the previous KL-inequality, we deduce that $(γ^{opt}-γ(w)) / R \le d(w) \le 2\sqrt{(γ^{opt}-γ(w))/γ^{opt}}$, where $R:=\max_i \|x_i\|$ is the maximum distance of the points $x_i$ from the origin. Consequently, for any optimization algorithm (gradient-descent or not), the bias of the iterates converges at least as fast as the square-root of the rate of their convergence of the margin. Thus, our work provides a generic tool for analyzing the implicit bias of any algorithm in terms of its margin, in situations where a specialized analysis might not be available: it is sufficient to establish a good rate for converge of the margin, a task which is usually much easier.

下载PDF全文

下载文献需遵守相关版权规定

论文标题