通过两层神经网络记忆的网络大小和权重大小

论文标题

通过两层神经网络记忆的网络大小和权重大小

Network size and weights size for memorization with two-layers neural networks

论文作者

Bubeck, Sébastien, Eldan, Ronen, Lee, Yin Tat, Mikulincer, Dan

论文摘要

1988年，埃里克·B·鲍姆（Eric B. Baum）表明，具有阈值激活功能的两层神经网络可以完美地记住$ n $点的二进制标签，仅使用$ \ ulcorner n/d \ ulcorner $ neurons $ \ mathbb {r}^d $在$ \ mathbb {r}^d $中。我们观察到，使用Relu Networks，使用四倍的神经元可以符合任意的真实标签。此外，对于近似记忆，直到错误$ε$，神经切线内核也只能使用$ o \ left（\ frac {n} {d} {d} \ cdot \ log（1/ε）\ right）$ neurons（假设数据也很好地分散）。但是，我们表明这些结构引起了网络，在这种网络中，神经元重量的幅度远非最佳。相比之下，我们根据复杂的（而不是真实的）重组对RELU网络提出了新的培训程序，为此，我们显示了近似记忆的$ o \ left（\ frac {n} {d} {d} \ cdot \ cdot \ frac \ frac {\ frac {\ log log log log（1/ε）}} $ neurons $ neuron，以及近乎尺寸。

In 1988, Eric B. Baum showed that two-layers neural networks with threshold activation function can perfectly memorize the binary labels of $n$ points in general position in $\mathbb{R}^d$ using only $\ulcorner n/d \urcorner$ neurons. We observe that with ReLU networks, using four times as many neurons one can fit arbitrary real labels. Moreover, for approximate memorization up to error $ε$, the neural tangent kernel can also memorize with only $O\left(\frac{n}{d} \cdot \log(1/ε) \right)$ neurons (assuming that the data is well dispersed too). We show however that these constructions give rise to networks where the magnitude of the neurons' weights are far from optimal. In contrast we propose a new training procedure for ReLU networks, based on complex (as opposed to real) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/ε)}ε\right)$ neurons, as well as nearly-optimal size of the weights.

下载PDF全文

下载文献需遵守相关版权规定

论文标题