隐式偏见可以解释概括吗？随机凸优化作为案例研究

论文标题

隐式偏见可以解释概括吗？随机凸优化作为案例研究

Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study

论文作者

Dauber, Assaf, Feder, Meir, Koren, Tomer, Livni, Roi

论文摘要

暗示了隐性偏见或隐式正则化的概念，以解释现代过度参数化学习算法令人惊讶的概括能力的一种手段。该概念是指优化算法对某种结构化解决方案的趋势，该解决方案通常可以很好地推广。最近，几篇论文研究了隐式正规化，并能够在各种情况下识别这种现象。我们可以在最简单的非平凡设置中重新审视这种范式，并在随机凸优化的背景下研究随机梯度下降（SGD）的隐式偏见。作为第一步，我们提供了一种简单的结构，该结构可以排除\ emph {distribution-Intepentent}隐式正规化程序的存在，该正规化器控制着SGD的概括能力。然后，我们展示了一个学习问题，该问题排除了一个非常通用的\ emph {分布依赖性}的隐式正规化器解释概括，其中包括强烈凸起的正规化器以及非脱位基于规范的正常化。我们的结构的某些方面指出，通过仅通过争论其隐式正则化属性来提供对算法的概括性能的全面解释。

The notion of implicit bias, or implicit regularization, has been suggested as a means to explain the surprising generalization ability of modern-days overparameterized learning algorithms. This notion refers to the tendency of the optimization algorithm towards a certain structured solution that often generalizes well. Recently, several papers have studied implicit regularization and were able to identify this phenomenon in various scenarios. We revisit this paradigm in arguably the simplest non-trivial setup, and study the implicit bias of Stochastic Gradient Descent (SGD) in the context of Stochastic Convex Optimization. As a first step, we provide a simple construction that rules out the existence of a \emph{distribution-independent} implicit regularizer that governs the generalization ability of SGD. We then demonstrate a learning problem that rules out a very general class of \emph{distribution-dependent} implicit regularizers from explaining generalization, which includes strongly convex regularizers as well as non-degenerate norm-based regularizations. Certain aspects of our constructions point out to significant difficulties in providing a comprehensive explanation of an algorithm's generalization performance by solely arguing about its implicit regularization properties.

下载PDF全文

下载文献需遵守相关版权规定

论文标题