宽阔而深层的网络是否学习相同的知识？发现神经网络表示如何随宽度和深度变化

论文标题

宽阔而深层的网络是否学习相同的知识？发现神经网络表示如何随宽度和深度变化

Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth

论文作者

Nguyen, Thao, Raghu, Maithra, Kornblith, Simon

论文摘要

深度神经网络成功的关键因素是能够通过改变体系结构的深度和宽度来扩展模型来提高性能。神经网络设计的这种简单属性为各种任务提供了高效的体系结构。然而，对深度和宽度对学习表示的影响的理解有限。在本文中，我们研究了这个基本问题。我们首先研究了不同的深度和宽度如何影响模型隐藏表示形式，在更大容量（更宽或更深层）模型的隐藏表示中找到一个特征性的块结构。我们证明，当模型容量相对于训练集的大小较大时，就会出现这种块结构，并指示了基础层保存和传播其表示形式的主要主要成分。该发现对通过不同模型学到的特征的特征具有重要的影响，即，在块结构之外的表示形式通常在宽度和深度不同的体系结构之间相似，但是块结构在每个模型中都是唯一的。我们分析了不同模型体系结构的输出预测，发现即使总体准确性相似，宽和深模型也会在各个类中显示出独特的误差模式和变化。

A key factor in the success of deep neural networks is the ability to scale models to improve performance by varying the architecture depth and width. This simple property of neural network design has resulted in highly effective architectures for a variety of tasks. Nevertheless, there is limited understanding of effects of depth and width on the learned representations. In this paper, we study this fundamental question. We begin by investigating how varying depth and width affects model hidden representations, finding a characteristic block structure in the hidden representations of larger capacity (wider or deeper) models. We demonstrate that this block structure arises when model capacity is large relative to the size of the training set, and is indicative of the underlying layers preserving and propagating the dominant principal component of their representations. This discovery has important ramifications for features learned by different models, namely, representations outside the block structure are often similar across architectures with varying widths and depths, but the block structure is unique to each model. We analyze the output predictions of different model architectures, finding that even when the overall accuracy is similar, wide and deep models exhibit distinctive error patterns and variations across classes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题