关于深度神经网络深度最小深度的隐性偏见

论文标题

关于深度神经网络深度最小深度的隐性偏见

On the Implicit Bias Towards Minimal Depth of Deep Neural Networks

论文作者

Galanti, Tomer, Galanti, Liane, Ben-Shaul, Ido

论文摘要

文献中的最新结果表明，经过分类训练的神经网络的倒数第二层（二次）层表示展示了一种称为神经崩溃的聚类特性（NC）。我们研究了训练深层神经网络时，随机梯度下降（SGD）的隐式偏见，有利于低深度溶液。我们表征了有效深度的概念，该概念测量了使用最近类中心分类器可分离样品嵌入的第一层。此外，我们假设和经验表明，SGD隐含地选择了小有效深度的神经网络。其次，尽管即使不可能进行概括，神经崩溃也会出现 - 我们认为中间层中的\ emph {可分离性}与概括有关。我们得出了一个基于将网络的有效深度与与部分损坏的标签相同的数据集进行比较最小深度的限制的概括。值得注意的是，这种结合提供了对测试性能的非平凡估计。最后，我们从经验上表明，在增加数据中随机标签的数量时，受过训练的神经网络的有效深度会单调增加。

Recent results in the literature suggest that the penultimate (second-to-last) layer representations of neural networks that are trained for classification exhibit a clustering property called neural collapse (NC). We study the implicit bias of stochastic gradient descent (SGD) in favor of low-depth solutions when training deep neural networks. We characterize a notion of effective depth that measures the first layer for which sample embeddings are separable using the nearest-class center classifier. Furthermore, we hypothesize and empirically show that SGD implicitly selects neural networks of small effective depths. Secondly, while neural collapse emerges even when generalization should be impossible - we argue that the \emph{degree of separability} in the intermediate layers is related to generalization. We derive a generalization bound based on comparing the effective depth of the network with the minimal depth required to fit the same dataset with partially corrupted labels. Remarkably, this bound provides non-trivial estimations of the test performance. Finally, we empirically show that the effective depth of a trained neural network monotonically increases when increasing the number of random labels in data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题