通过亚计划的最佳彩票：对数过度参数足够

论文标题

通过亚计划的最佳彩票：对数过度参数足够

Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient

论文作者

Pensia, Ankit, Rajput, Shashank, Nagle, Alliot, Vishwakarma, Harit, Papailiopoulos, Dimitris

论文摘要

强的{\ it彩票假说}（LTH）假设可以通过仅修剪足够过度参数的随机网络的权重来近似任何目标神经网络。 Malach等人的最新工作。 \ cite {malachetal20}通过修剪一个随机的$ O（d^4l^2）$ flide of toste oftth $ d $ and depth $ l $，可以近似于宽度$ d $ and depth $ l $的神经网络的第一个理论分析：一个人可以近似于宽度$ d $和depth $ l $。这种多项式过度参数的要求与最近的实验研究不一致，后者与网络相比，这比目标较小。在这项工作中，我们缩小了差距，并为存在彩票的过度参数要求提供了指数改进。我们表明，通过修剪一个随机网络，该网络是一个因子$ o（\ log（dl））$宽，并且深度是两倍，可以通过修剪一个随机网络来近似宽度$ d $ and depth $ l $的任何目标网络。我们的分析在很大程度上依赖于将修剪随机relu网络连接到\ textsc {subsetsum}问题的随机实例。然后，我们证明这种对数过度参数化基本上是恒定深度网络的最佳选择。最后，我们通过实验验证了我们的一些理论见解。

The strong {\it lottery ticket hypothesis} (LTH) postulates that one can approximate any target neural network by only pruning the weights of a sufficiently over-parameterized random network. A recent work by Malach et al. \cite{MalachEtAl20} establishes the first theoretical analysis for the strong LTH: one can provably approximate a neural network of width $d$ and depth $l$, by pruning a random one that is a factor $O(d^4l^2)$ wider and twice as deep. This polynomial over-parameterization requirement is at odds with recent experimental research that achieves good approximation with networks that are a small factor wider than the target. In this work, we close the gap and offer an exponential improvement to the over-parameterization requirement for the existence of lottery tickets. We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(\log(dl))$ wider and twice as deep. Our analysis heavily relies on connecting pruning random ReLU networks to random instances of the \textsc{SubsetSum} problem. We then show that this logarithmic over-parameterization is essentially optimal for constant depth networks. Finally, we verify several of our theoretical insights with experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题