论文标题
通过凸分析的概括界限
Generalization Bounds via Convex Analysis
论文作者
论文摘要
自从Russo and Zou(2016,2019)和Xu and Raginsky(2017)的著名作品以来,众所周知,鉴于任何固定假设的损失都具有Subgaussian的尾巴,因此可以根据其输入和输出之间的共同信息来界定监督学习算法的概括错误。在这项工作中,我们将此结果推广到Shannon互信息的标准选择之外,以衡量输入和输出之间的依赖性。我们的主要结果表明,确实可以通过联合输入输出分布的任何强凸功能替换互信息,而损失的subgaussiasity条件被以适当选择的规范捕获依赖性度量的几何形状取代的损失所取代。这使我们能够得出一系列概括范围,这些范围是全新的,也可以增强以前已知的范围。示例包括按$ p $ norm Diverences和Wasserstein-2距离表示的界限,它们分别适用于重尾损失分布和高度平滑的损失功能。我们的分析完全基于来自凸分析的基本工具,通过跟踪与依赖度量和损失函数相关的潜在功能的增长。
Since the celebrated works of Russo and Zou (2016,2019) and Xu and Raginsky (2017), it has been well known that the generalization error of supervised learning algorithms can be bounded in terms of the mutual information between their input and the output, given that the loss of any fixed hypothesis has a subgaussian tail. In this work, we generalize this result beyond the standard choice of Shannon's mutual information to measure the dependence between the input and the output. Our main result shows that it is indeed possible to replace the mutual information by any strongly convex function of the joint input-output distribution, with the subgaussianity condition on the losses replaced by a bound on an appropriately chosen norm capturing the geometry of the dependence measure. This allows us to derive a range of generalization bounds that are either entirely new or strengthen previously known ones. Examples include bounds stated in terms of $p$-norm divergences and the Wasserstein-2 distance, which are respectively applicable for heavy-tailed loss distributions and highly smooth loss functions. Our analysis is entirely based on elementary tools from convex analysis by tracking the growth of a potential function associated with the dependence measure and the loss function.