论文标题
更好吗?这取决于相关功能的局部性
Is deeper better? It depends on locality of relevant features
论文作者
论文摘要
人们已经认识到,在各种机器学习任务中,大量过度参数化的人工神经网络表现出令人惊讶的良好概括性能。最近的理论研究试图揭示过度参数化的奥秘。在以前的大多数作品中,过度参数是通过增加网络宽度来实现的,而增加深度的效果的理解程度不高。在这项工作中,我们研究了增加过度参数化制度内深度的效果。为了深入了解深度的优势,我们将本地和全局标签介绍为抽象但简单的分类规则。事实证明,给定分类规则的相关特征的局部性起着关键作用。我们的实验结果表明,更深层对局部标签更好,而较浅对于全球标签更好。我们还将有限网络的结果与神经切线内核(NTK)的结果进行了比较,该网络的结果与无限宽的网络相当,具有适当的初始化和无限的学习率。结果表明,NTK无法正确捕获概括性能的深度依赖性,这表明特征学习的重要性而不是懒惰学习。
It has been recognized that a heavily overparameterized artificial neural network exhibits surprisingly good generalization performance in various machine-learning tasks. Recent theoretical studies have made attempts to unveil the mystery of the overparameterization. In most of those previous works, the overparameterization is achieved by increasing the width of the network, while the effect of increasing the depth has remained less well understood. In this work, we investigate the effect of increasing the depth within an overparameterized regime. To gain an insight into the advantage of depth, we introduce local and global labels as abstract but simple classification rules. It turns out that the locality of the relevant feature for a given classification rule plays a key role; our experimental results suggest that deeper is better for local labels, whereas shallower is better for global labels. We also compare the results of finite networks with those of the neural tangent kernel (NTK), which is equivalent to an infinitely wide network with a proper initialization and an infinitesimal learning rate. It is shown that the NTK does not correctly capture the depth dependence of the generalization performance, which indicates the importance of the feature learning rather than the lazy learning.