论文标题
模型大小,测试损失和线性预测因子的训练损失之间的普遍权衡
A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors
论文作者
论文摘要
在这项工作中,我们建立了一种算法和分布分布在模型大小,过量测试损失和线性预测因子的训练损失之间的独立非反应折衷。具体来说,我们表明在测试数据上表现良好的模型要么是“经典” - 与噪声水平接近的训练损失,要么是“现代” - 与精确适合训练数据所需的最小值相比,参数数量要大得多。 当白色特征的限制光谱分布是Marchenko-Pastur时,我们还提供了更精确的渐近分析。值得注意的是,虽然Marchenko-Pastur分析在插值峰附近更为精确,在插值峰附近,参数的数量足以适应训练数据,但它与分布独立的结合完全一致,因为过多参数的水平增加。
In this work we establish an algorithm and distribution independent non-asymptotic trade-off between the model size, excess test loss, and training loss of linear predictors. Specifically, we show that models that perform well on the test data (have low excess loss) are either "classical" -- have training loss close to the noise level, or are "modern" -- have a much larger number of parameters compared to the minimum needed to fit the training data exactly. We also provide a more precise asymptotic analysis when the limiting spectral distribution of the whitened features is Marchenko-Pastur. Remarkably, while the Marchenko-Pastur analysis is far more precise near the interpolation peak, where the number of parameters is just enough to fit the training data, it coincides exactly with the distribution independent bound as the level of overparametrization increases.