重新依赖网络的尖锐表示定理，具有精确的深度依赖性

论文标题

重新依赖网络的尖锐表示定理，具有精确的深度依赖性

Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth

论文作者

Bresler, Guy, Nagaraj, Dheeraj

论文摘要

我们证明，对于一类函数$ \ MATHCAL {G} _D $定义的功能类别的$ D $ relu层的神经网络的无尺寸表示结果。这些结果从以下意义上捕获了深度的确切好处： 1。代表函数类别$ \ MATHCAL {g} _d $通过$ D $ relu层的速率是明显的，如常数，如匹配下限所示。 2。对于每个$ d $，$ \ MATHCAL {g} _ {d} \ subseteq \ MathCal {g} _ {d+1} $，并且随着$ d $的形式，$ d $增长了函数类$ \ MATHCAL {g} _ {d} _ {d} $逐渐逐渐逐渐流畅的功能。 3。如果$ d^{\ prime} <d $，则类$ \ Mathcal {g} _d $由深度$ d^{\ prime} $ networks实现的近似率严格比深度$ d $ networks所达到的差异。这构成了任意深度$ d $和神经元数量$ n $的前馈网络的表示功能的细粒度表征，与现有表示结果相比，该结果要求$ d $在$ n $中迅速生长，或者假设所代表的功能高度光滑。在后一种情况下，可以使用单个非线性层获得类似的速率。我们的结果证实了这样一个普遍的假设，即更深的网络更擅长表示较少的光滑功能，实际上，主要的技术新颖性是充分利用深层网络可以产生高度振荡功能而具有很少激活功能的事实。

We prove sharp dimension-free representation results for neural networks with $D$ ReLU layers under square loss for a class of functions $\mathcal{G}_D$ defined in the paper. These results capture the precise benefits of depth in the following sense: 1. The rates for representing the class of functions $\mathcal{G}_D$ via $D$ ReLU layers is sharp up to constants, as shown by matching lower bounds. 2. For each $D$, $\mathcal{G}_{D} \subseteq \mathcal{G}_{D+1}$ and as $D$ grows the class of functions $\mathcal{G}_{D}$ contains progressively less smooth functions. 3. If $D^{\prime} < D$, then the approximation rate for the class $\mathcal{G}_D$ achieved by depth $D^{\prime}$ networks is strictly worse than that achieved by depth $D$ networks. This constitutes a fine-grained characterization of the representation power of feedforward networks of arbitrary depth $D$ and number of neurons $N$, in contrast to existing representation results which either require $D$ growing quickly with $N$ or assume that the function being represented is highly smooth. In the latter case similar rates can be obtained with a single nonlinear layer. Our results confirm the prevailing hypothesis that deeper networks are better at representing less smooth functions, and indeed, the main technical novelty is to fully exploit the fact that deep networks can produce highly oscillatory functions with few activation functions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题