通过初始化规模的极端记忆

论文标题

通过初始化规模的极端记忆

Extreme Memorization via Scale of Initialization

论文作者

Mehta, Harsh, Cutkosky, Ashok, Neyshabur, Behnam

论文摘要

我们构建了一个实验设置，在该设置中，改变初始化的规模会强烈影响SGD引起的隐式正则化，从良好的概括性能到完全记住训练集，而在测试集上几乎没有进展。此外，我们发现影响概括能力的程度和方式取决于所使用的激活和损失函数，$ \ sin $激活表明了极端的记忆。在均质relu激活的情况下，我们表明这种行为可以归因于损失函数。我们的实证研究表明，提高初始化的规模与同一阶级示例中的表示和梯度的不对对准相关。这种见解使我们能够对可以捕获这种现象的梯度和表示形式进行对准度量。我们证明，我们的对准度量与对图像分类任务训练的深层模型的概括相关。

We construct an experimental setup in which changing the scale of initialization strongly impacts the implicit regularization induced by SGD, interpolating from good generalization performance to completely memorizing the training set while making little progress on the test set. Moreover, we find that the extent and manner in which generalization ability is affected depends on the activation and loss function used, with $\sin$ activation demonstrating extreme memorization. In the case of the homogeneous ReLU activation, we show that this behavior can be attributed to the loss function. Our empirical investigation reveals that increasing the scale of initialization correlates with misalignment of representations and gradients across examples in the same class. This insight allows us to devise an alignment measure over gradients and representations which can capture this phenomenon. We demonstrate that our alignment measure correlates with generalization of deep models trained on image classification tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题