论文标题
关于学习的统计效率
On Statistical Efficiency in Learning
论文作者
论文摘要
许多统计学习问题的一个核心问题是从一组候选模型中选择适当的模型。大型模型倾向于膨胀方差(或过度拟合),而小型模型往往会对给定的固定数据集引起偏见(或拟合不足)。在这项工作中,我们解决了模型选择的关键挑战,即在模型拟合和模型复杂性之间取得平衡,从而获得可靠的预测能力。我们考虑接近统计学习的理论限制的任务,这意味着所选模型的预测性能与给定一类潜在拼写错误的候选模型的最佳模型一样好。我们提出了一个广义的takeuchi信息标准的概念,并证明所提出的方法可以在合理假设下渐近地实现最佳的外样品预测损失。据我们所知,这是Takeuchi信息标准的渐近属性的第一个证明。我们的证明适用于多种非线性模型,损失功能和高维度(从某种意义上说,模型的复杂性可以随样本量而增长)。所提出的方法可以用作一个计算有效的替代物,用于剩余的交叉验证。此外,对于建模流数据,我们提出了一种在线算法,该算法依次扩大模型复杂性,以增强选择稳定性并降低计算成本。实验研究表明,所提出的方法具有理想的预测能力,并且计算成本明显少于某些流行方法。
A central issue of many statistical learning problems is to select an appropriate model from a set of candidate models. Large models tend to inflate the variance (or overfitting), while small models tend to cause biases (or underfitting) for a given fixed dataset. In this work, we address the critical challenge of model selection to strike a balance between model fitting and model complexity, thus gaining reliable predictive power. We consider the task of approaching the theoretical limit of statistical learning, meaning that the selected model has the predictive performance that is as good as the best possible model given a class of potentially misspecified candidate models. We propose a generalized notion of Takeuchi's information criterion and prove that the proposed method can asymptotically achieve the optimal out-sample prediction loss under reasonable assumptions. It is the first proof of the asymptotic property of Takeuchi's information criterion to our best knowledge. Our proof applies to a wide variety of nonlinear models, loss functions, and high dimensionality (in the sense that the models' complexity can grow with sample size). The proposed method can be used as a computationally efficient surrogate for leave-one-out cross-validation. Moreover, for modeling streaming data, we propose an online algorithm that sequentially expands the model complexity to enhance selection stability and reduce computation cost. Experimental studies show that the proposed method has desirable predictive power and significantly less computational cost than some popular methods.