宽度不如Relu神经网络的深度重要

论文标题

宽度不如Relu神经网络的深度重要

Width is Less Important than Depth in ReLU Neural Networks

论文作者

Vardi, Gal, Yehudai, Gilad, Shamir, Ohad

论文摘要

我们解决了Lu等人的一个公开问题。（2017年），通过证明$ \ mathbb {r}^d $中输入的任何目标网络都可以通过宽度$ o（d）$网络（独立于目标网络的体系结构）来近似，其参数的数量仅通过线性因子就大大较大。鉴于先前的深度分离定理，这意味着当宽度和深度的角色互换时，相似的结果无法保持，因此，深度在神经网络的表达能力中起着比宽度更重要的作用。我们将结果扩展到具有有界权重的网络，并最多构建具有宽度的网络，最多$ d+2 $，由于以前的下限，这接近可能的最小宽度。这两种构造在目标网络上的参数数量中导致额外的多项式因素。我们还使用深层和狭窄的网络显示了宽和浅网络的确切表示，在某些情况下，该网络不会增加目标网络上的参数数量。

We solve an open question from Lu et al. (2017), by showing that any target network with inputs in $\mathbb{R}^d$ can be approximated by a width $O(d)$ network (independent of the target network's architecture), whose number of parameters is essentially larger only by a linear factor. In light of previous depth separation theorems, which imply that a similar result cannot hold when the roles of width and depth are interchanged, it follows that depth plays a more significant role than width in the expressive power of neural networks. We extend our results to constructing networks with bounded weights, and to constructing networks with width at most $d+2$, which is close to the minimal possible width due to previous lower bounds. Both of these constructions cause an extra polynomial factor in the number of parameters over the target network. We also show an exact representation of wide and shallow networks using deep and narrow networks which, in certain cases, does not increase the number of parameters over the target network.

下载PDF全文

下载文献需遵守相关版权规定

论文标题