深度学习的概括范围

论文标题

深度学习的概括范围

Generalization bounds for deep learning

论文作者

Valle-Pérez, Guillermo, Louis, Ard A.

论文摘要

深度学习的概括一直是最近理论和实证研究的主题。在这里，我们介绍了Desiderata的技术，这些技术可以预测监督学习中深度学习模型的泛化错误。这样的预测应为1）正确扩展数据复杂性； 2）按训练设定尺寸正确缩放； 3）捕获建筑之间的差异； 4）捕获优化算法之间的差异； 5）从数量上距离真误差不远（尤其是不变）； 6）有效地计算； 7）严格。我们专注于概括误差上限，并根据算法和数据的假设引入界限的分类。我们回顾了从古典风险投资维度到最近的Pac-Bayesian界限的广泛现有方法，并评论了它们对Desiderata的表现。接下来，我们使用基于函数的图片来得出边缘样的Pac-bayesian Bound。从一个定义上来看，只要学习曲线遵循幂律，这种界限是最佳的，最佳达到大型训练集的渐近限制，这通常是在实践中用于深度学习问题。广泛的经验分析表明，我们的边际样本豆bay的界限满足了Desiderata 1-3和5。6和7的结果是有希望的，但尚未完全结论，而Desideratum 4目前仅超出了我们界限的范围。最后，我们评论了为什么这种基于功能的界限的性能明显优于当前基于参数的PAC-bayes界限。

Generalization in deep learning has been the topic of much recent theoretical and empirical research. Here we introduce desiderata for techniques that predict generalization errors for deep learning models in supervised learning. Such predictions should 1) scale correctly with data complexity; 2) scale correctly with training set size; 3) capture differences between architectures; 4) capture differences between optimization algorithms; 5) be quantitatively not too far from the true error (in particular, be non-vacuous); 6) be efficiently computable; and 7) be rigorous. We focus on generalization error upper bounds, and introduce a categorisation of bounds depending on assumptions on the algorithm and data. We review a wide range of existing approaches, from classical VC dimension to recent PAC-Bayesian bounds, commenting on how well they perform against the desiderata. We next use a function-based picture to derive a marginal-likelihood PAC-Bayesian bound. This bound is, by one definition, optimal up to a multiplicative constant in the asymptotic limit of large training sets, as long as the learning curve follows a power law, which is typically found in practice for deep learning problems. Extensive empirical analysis demonstrates that our marginal-likelihood PAC-Bayes bound fulfills desiderata 1-3 and 5. The results for 6 and 7 are promising, but not yet fully conclusive, while only desideratum 4 is currently beyond the scope of our bound. Finally, we comment on why this function-based bound performs significantly better than current parameter-based PAC-Bayes bounds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题