论文标题
良性过度拟合的隐性偏见
The Implicit Bias of Benign Overfitting
论文作者
论文摘要
良性过度拟合的现象是,预测因子在获得近乎最佳的预期损失的同时非常适合嘈杂的训练数据,但近年来受到了很多关注,但除了明确指定的线性回归设置以外,仍然尚未完全了解。在本文中,我们为回归和分类任务何时或不能指望何时会发生过度拟合的良性过度拟合。我们考虑了一种原型且相当通用的数据模型,用于良性过度拟合线性预测指标,其中某些固定尺寸$ K $的任意输入分布与高维分布相连。对于不一定有充分指定的线性回归,我们表明最小值 - 标准插值预测因子(该标准训练方法会融合到)偏向不一致的解决方案,因此通常不会发生良性过度拟合。此外,我们通过一个论点来证明如何在某些回归问题上过度适应良性的存在如何将其扩展到标准线性回归之外,如何将其扩展到标准线性回归之外。然后,我们转向分类问题,并证明情况更加有利。具体而言,我们证明,最大利润预测因子(已知标准训练方法会沿方向收敛)渐近地偏向最小化加权\ emph {平方铰链损耗}。这使我们能够将分类过度拟合的良性过度拟合的问题减少到更简单的问题,即这种损失是否是错误分类错误的替代品,并使用它在某些新设置中表现出良性过度拟合。
The phenomenon of benign overfitting, where a predictor perfectly fits noisy training data while attaining near-optimal expected loss, has received much attention in recent years, but still remains not fully understood beyond well-specified linear regression setups. In this paper, we provide several new results on when one can or cannot expect benign overfitting to occur, for both regression and classification tasks. We consider a prototypical and rather generic data model for benign overfitting of linear predictors, where an arbitrary input distribution of some fixed dimension $k$ is concatenated with a high-dimensional distribution. For linear regression which is not necessarily well-specified, we show that the minimum-norm interpolating predictor (that standard training methods converge to) is biased towards an inconsistent solution in general, hence benign overfitting will generally not occur. Moreover, we show how this can be extended beyond standard linear regression, by an argument proving how the existence of benign overfitting on some regression problems precludes its existence on other regression problems. We then turn to classification problems, and show that the situation there is much more favorable. Specifically, we prove that the max-margin predictor (to which standard training methods are known to converge in direction) is asymptotically biased towards minimizing a weighted \emph{squared hinge loss}. This allows us to reduce the question of benign overfitting in classification to the simpler question of whether this loss is a good surrogate for the misclassification error, and use it to show benign overfitting in some new settings.