签署超级携带：保持，隐藏，倒转

论文标题

签署超级携带：保持，隐藏，倒转

Signing the Supermask: Keep, Hide, Invert

论文作者

Koster, Nils, Grothe, Oliver, Rettinger, Achim

论文摘要

在过去几年中，神经网络参数数量的指数增长伴随着多个领域的性能的提高。但是，由于它们的规模庞大，这些网络不仅难以解释，而且在现实世界应用程序中训练和使用的问题也有问题，因为硬件要求相应地增加了。解决这两个问题时，我们提出了一种新颖的方法，该方法要么降低神经网络的初始权重，要么颠倒其各自的迹象。简而言之，通过重量选择和反转训练网络而不会改变其绝对值。我们的贡献通过签署初始权重并遵循彩票票证假设的发现，扩大了对掩蔽的先前工作。通过这种初始化方法的扩展和改编，我们达到的修剪速率高达99％，同时仍匹配或超过各种基线和以前的模型的性能。我们的方法具有两个主要优势。首先，也是最值得注意的，签名的超级手机模型大大简化了模型的结构，同时仍然在给定的任务上表现良好。其次，通过将神经网络减少到其基础上，我们就可以洞悉对性能至关重要的权重。该代码可在GitHub上找到。

The exponential growth in numbers of parameters of neural networks over the past years has been accompanied by an increase in performance across several fields. However, due to their sheer size, the networks not only became difficult to interpret but also problematic to train and use in real-world applications, since hardware requirements increased accordingly. Tackling both issues, we present a novel approach that either drops a neural network's initial weights or inverts their respective sign. Put simply, a network is trained by weight selection and inversion without changing their absolute values. Our contribution extends previous work on masking by additionally sign-inverting the initial weights and follows the findings of the Lottery Ticket Hypothesis. Through this extension and adaptations of initialization methods, we achieve a pruning rate of up to 99%, while still matching or exceeding the performance of various baseline and previous models. Our approach has two main advantages. First, and most notable, signed Supermask models drastically simplify a model's structure, while still performing well on given tasks. Second, by reducing the neural network to its very foundation, we gain insights into which weights matter for performance. The code is available on GitHub.

下载PDF全文

下载文献需遵守相关版权规定

论文标题