神经网络的更严格的风险证书

论文标题

神经网络的更严格的风险证书

Tighter risk certificates for neural networks

论文作者

Pérez-Ortiz, María, Rivasplata, Omar, Shawe-Taylor, John, Szepesvári, Csaba

论文摘要

本文介绍了一项有关使用Pac-Bayes边界衍生的培训目标培训概率神经网络的实证研究。在概率神经网络的背景下，训练的输出是网络权重的概率分布。我们提出了两个与培训神经网络有关的培训目标，首次在这里使用。这两个训练目标源自紧密的Pac-Bayes边界。我们还基于经典的pac-bayes绑定的先前使用的培训目标重新实现，以比较使用不同的培训目标所学的预测变量的特性。我们根据用于学习预测变量的数据的一部分来计算学习预测指标的风险证书。我们进一步在权重（无数据依赖和数据依赖性先验）和神经网络体系结构上对不同类型的先验进行了实验。我们对MNIST和CIFAR-10的实验表明，我们的培训方法会产生竞争性的测试设置错误和不变风险界限，其价值比文献中以前的结果更高，这不仅表明有望通过界限风险来指导学习算法，还表明了模型选择。这些观察结果表明，这里研究的方法可能是自我认证学习的良好候选者，这是在使用整个数据集学习预测指标的意义上，并在任何不见了的数据（与培训数据的相同分布）上证明其风险可能无需持有测试数据。

This paper presents an empirical study regarding training probabilistic neural networks using training objectives derived from PAC-Bayes bounds. In the context of probabilistic neural networks, the output of training is a probability distribution over network weights. We present two training objectives, used here for the first time in connection with training neural networks. These two training objectives are derived from tight PAC-Bayes bounds. We also re-implement a previously used training objective based on a classical PAC-Bayes bound, to compare the properties of the predictors learned using the different training objectives. We compute risk certificates for the learnt predictors, based on part of the data used to learn the predictors. We further experiment with different types of priors on the weights (both data-free and data-dependent priors) and neural network architectures. Our experiments on MNIST and CIFAR-10 show that our training methods produce competitive test set errors and non-vacuous risk bounds with much tighter values than previous results in the literature, showing promise not only to guide the learning algorithm through bounding the risk but also for model selection. These observations suggest that the methods studied here might be good candidates for self-certified learning, in the sense of using the whole data set for learning a predictor and certifying its risk on any unseen data (from the same distribution as the training data) potentially without the need for holding out test data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题