论文标题
广泛而深层的神经网络可实现分类的最佳性
Wide and Deep Neural Networks Achieve Optimality for Classification
论文作者
论文摘要
尽管神经网络用于跨域的分类任务,但机器学习中的一个长期开放问题是确定使用标准程序训练的神经网络对于分类是最佳的,即,这种模型是否最大程度地减少了对任意数据分布的错误分类的可能性。在这项工作中,我们识别并构建了一组明确的神经网络分类器,这些分类器可实现最佳性。由于实践中有效的神经网络通常是宽和深的,因此我们分析了也无限深的无限宽网络。特别是,使用无限宽的神经网络和神经切线内核之间的连接,我们提供了明确的激活功能,可用于构建实现最佳性的网络。有趣的是,这些激活功能简单易于实现,但与常用的激活(例如Relu或Sigmoid)有所不同。更普遍地,我们创建了无限宽和深网络的分类法,并表明这些模型根据所使用的激活函数实现了三个众所周知的分类器之一:(1)(1)1-纽约最邻居(模型预测由最近的培训示例的标签给出); (2)多数投票(模型预测由班级的标签给出,在培训集中表现最大);或(3)单数内核分类器(一组包含获得最佳性的分类器)。我们的结果强调了将深层进行分类任务的好处,与回归任务相比,深度过度有害。
While neural networks are used for classification tasks across domains, a long-standing open problem in machine learning is determining whether neural networks trained using standard procedures are optimal for classification, i.e., whether such models minimize the probability of misclassification for arbitrary data distributions. In this work, we identify and construct an explicit set of neural network classifiers that achieve optimality. Since effective neural networks in practice are typically both wide and deep, we analyze infinitely wide networks that are also infinitely deep. In particular, using the recent connection between infinitely wide neural networks and Neural Tangent Kernels, we provide explicit activation functions that can be used to construct networks that achieve optimality. Interestingly, these activation functions are simple and easy to implement, yet differ from commonly used activations such as ReLU or sigmoid. More generally, we create a taxonomy of infinitely wide and deep networks and show that these models implement one of three well-known classifiers depending on the activation function used: (1) 1-nearest neighbor (model predictions are given by the label of the nearest training example); (2) majority vote (model predictions are given by the label of the class with greatest representation in the training set); or (3) singular kernel classifiers (a set of classifiers containing those that achieve optimality). Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful.