清晰度感知最小化以有效改善概括

论文标题

清晰度感知最小化以有效改善概括

Sharpness-Aware Minimization for Efficiently Improving Generalization

论文作者

Foret, Pierre, Kleiner, Ariel, Mobahi, Hossein, Neyshabur, Behnam

论文摘要

在当今严重的过度参数化模型中，培训损失的价值几乎没有保证模型泛化能力。实际上，仅像通常这样做一样优化训练损失值可以很容易地导致次优质量。通过先前连接损失景观和概括的几何形状的工作，我们引入了一种新颖，有效的程序，以同时最大程度地减少损失价值和损失清晰度。特别是，我们的程序，清晰度感知最小化（SAM），寻求位于统一损失均匀损失的社区中；该公式导致最小最大优化问题，可以在其中有效地执行梯度下降。我们提出了经验结果表明，SAM改善了各种基准数据集（例如CIFAR-10，CIFAR-100，Imagenet，Finetuning Tasks）和模型的模型概括，从而产生了几种新型的最新性能。此外，我们发现SAM本地为标签噪声提供了稳健性，该标签与最先进的程序所提供的标签相同，这些程序专门针对嘈杂的标签针对学习。我们通过\ url {https://github.com/google-research/sam}开源代码。

In today's heavily overparameterized models, the value of the training loss provides few guarantees on model generalization ability. Indeed, optimizing only the training loss value, as is commonly done, can easily lead to suboptimal model quality. Motivated by prior work connecting the geometry of the loss landscape and generalization, we introduce a novel, effective procedure for instead simultaneously minimizing loss value and loss sharpness. In particular, our procedure, Sharpness-Aware Minimization (SAM), seeks parameters that lie in neighborhoods having uniformly low loss; this formulation results in a min-max optimization problem on which gradient descent can be performed efficiently. We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets (e.g., CIFAR-10, CIFAR-100, ImageNet, finetuning tasks) and models, yielding novel state-of-the-art performance for several. Additionally, we find that SAM natively provides robustness to label noise on par with that provided by state-of-the-art procedures that specifically target learning with noisy labels. We open source our code at \url{https://github.com/google-research/sam}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题