超越神经缩放法律：通过数据修剪击败功率定律缩放

论文标题

超越神经缩放法律：通过数据修剪击败功率定律缩放

Beyond neural scaling laws: beating power law scaling via data pruning

论文作者

Sorscher, Ben, Geirhos, Robert, Shekhar, Shashank, Ganguli, Surya, Morcos, Ari S.

论文摘要

广泛观察到的神经缩放定律，其中错误是训练集大小，模型大小或两者兼有的力量，从而促进了深度学习的实质性改进。但是，仅通过缩放来进行这些改进就需要计算和能源成本相当大。在这里，我们专注于数据集大小的误差缩放缩放，并展示我们如何超越幂律法律规模，甚至有可能将其降低到指数缩放，如果我们可以访问高质量的数据修剪度量指标，以排除应丢弃培训示例以实现任何修剪的数据集尺寸的顺序。然后，我们通过经验修剪的数据集大小来测试这种改进的缩放预测，实际上，在CIFAR-10，SVHN和Imagenet训练的重新NET上，在实践中观察到的比较法律比幂律缩放更好。接下来，鉴于找到高质量的修剪指标的重要性，我们对Imagenet上十个不同的数据修剪指标进行了第一个大规模的基准测试研究。我们发现大多数现有的高性能指标尺寸较差，而最佳的指标则是计算密集型的，并且需要每个图像标签。因此，我们开发了一种新的简单，便宜和可扩展的自我监督的修剪指标，该指标与最佳监督指标相当。总体而言，我们的工作表明，发现良好的数据固定指标可能会为基本改善的神经缩放定律提供可行的途径，从而降低现代深度学习的资源成本。

Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how in theory we can break beyond power law scaling and potentially even reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this improved scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling in practice on ResNets trained on CIFAR-10, SVHN, and ImageNet. Next, given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of ten different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题