平滑的几何形状以实现强大的归因

论文标题

平滑的几何形状以实现强大的归因

Smoothed Geometry for Robust Attribution

论文作者

Wang, Zifan, Wang, Haofan, Ramkumar, Shakul, Fredrikson, Matt, Mardziel, Piotr, Datta, Anupam

论文摘要

特征归因是解释深神经网络（DNN）行为的流行工具，但最近被证明很容易受到产生附近输入的不同解释的攻击。在高风险应用中，这种缺乏鲁棒性尤其有问题，在这种高风险应用程序中，对手操纵的解释可能会损害安全性和可信赖性。在对最近工作中提出的这些攻击的几何理解的基础上，我们确定了Lipschitz的连续性条件在模型的梯度上导致稳健的基于梯度的归因，并观察到平滑度也可能与攻击跨多个归因方法传递的能力有关。为了减轻这些攻击，我们提出了一种廉价的正则化方法，该方法促进了DNN中的这些条件，以及一种不需要重新训练的随机平滑技术。我们对一系列图像模型的实验表明，这两种缓解措施都一致地提高了归因鲁棒性，并确认了平滑几何形状在这些对真实大型模型的攻击中所扮演的作用。

Feature attributions are a popular tool for explaining the behavior of Deep Neural Networks (DNNs), but have recently been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs. This lack of robustness is especially problematic in high-stakes applications where adversarially-manipulated explanations could impair safety and trustworthiness. Building on a geometric understanding of these attacks presented in recent work, we identify Lipschitz continuity conditions on models' gradient that lead to robust gradient-based attributions, and observe that smoothness may also be related to the ability of an attack to transfer across multiple attribution methods. To mitigate these attacks in practice, we propose an inexpensive regularization method that promotes these conditions in DNNs, as well as a stochastic smoothing technique that does not require re-training. Our experiments on a range of image models demonstrate that both of these mitigations consistently improve attribution robustness, and confirm the role that smooth geometry plays in these attacks on real, large-scale models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题