通过不确定性驱动扰动来改善概括

论文标题

通过不确定性驱动扰动来改善概括

Improving Generalization via Uncertainty Driven Perturbations

论文作者

Pagliardini, Matteo, Manunza, Gilberto, Jaggi, Martin, Jordan, Michael I., Chavdarova, Tatjana

论文摘要

最近，Shah等人2020指出了简单性偏差的陷阱 - 基于梯度的算法学习简单模型的趋势 - 其中包括该模型对小输入扰动的高灵敏度，以及亚最佳边缘。特别是，虽然随机梯度下降在线性模型上产生最大边界边界，但这种保证并未扩展到非线性模型。为了减轻简单性偏差，我们考虑训练数据点的不确定性驱动扰动（UDP），通过遵循最大化模型估计的不确定性的方向而获得迭代。不确定性估计不依赖于输入的标签，并且在决策边界处是最高的，并且与损失驱动的扰动不同，它允许对扰动幅度使用较大的值范围。此外，由于实际数据集在不同类别的数据点之间具有非各向异性距离，因此上述属性对于增加决策边界的边距特别有吸引力，这又改善了模型的概括。我们表明，UDP可以保证在线性模型上实现最大保证金决策边界，并且它在挑战性的模拟数据集上显着增加了它。对于非线性模型，我们从经验上表明，UDP降低了简单性偏见并学习了更详尽的功能。有趣的是，它还可以在几个数据集上实现基于竞争损失的鲁棒性和概括权衡。

Recently Shah et al., 2020 pointed out the pitfalls of the simplicity bias - the tendency of gradient-based algorithms to learn simple models - which include the model's high sensitivity to small input perturbations, as well as sub-optimal margins. In particular, while Stochastic Gradient Descent yields max-margin boundary on linear models, such guarantee does not extend to non-linear models. To mitigate the simplicity bias, we consider uncertainty-driven perturbations (UDP) of the training data points, obtained iteratively by following the direction that maximizes the model's estimated uncertainty. The uncertainty estimate does not rely on the input's label and it is highest at the decision boundary, and - unlike loss-driven perturbations - it allows for using a larger range of values for the perturbation magnitude. Furthermore, as real-world datasets have non-isotropic distances between data points of different classes, the above property is particularly appealing for increasing the margin of the decision boundary, which in turn improves the model's generalization. We show that UDP is guaranteed to achieve the maximum margin decision boundary on linear models and that it notably increases it on challenging simulated datasets. For nonlinear models, we show empirically that UDP reduces the simplicity bias and learns more exhaustive features. Interestingly, it also achieves competitive loss-based robustness and generalization trade-off on several datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题