利用肯德尔的等级相关性与余弦相似性之间的关系以保护归因保护

论文标题

利用肯德尔的等级相关性与余弦相似性之间的关系以保护归因保护

Exploiting the Relationship Between Kendall's Rank Correlation and Cosine Similarity for Attribution Protection

论文作者

Wang, Fan, Kong, Adams Wai-Kin

论文摘要

模型归因在深度神经网络中很重要，因为它们可以帮助实践者理解模型，但是最近的研究表明，通过向输入中添加不可察觉的噪声可以轻松扰动归因。非差异性肯德尔的排名相关性是归因保护的关键性能指数。在本文中，我们首先表明，预期的肯德尔的等级相关性与余弦相似性呈正相关，然后表明归因方向是属性鲁棒性的关键。基于这些发现，我们探索了归因的矢量空间，以使用$ \ ell_p $ norm来解释归因防御方法的缺点，并提出了集成的梯度正常化程序（IGR），从而最大程度地提高了自然和扰动属性之间的余弦相似性。我们的分析进一步公开了IGR鼓励具有相同激活状态的天然样品和相应扰动样品的神经元，这证明可以诱导基于梯度的归因方法的鲁棒性。我们在不同模型和数据集上的实验证实了我们对归因保护的分析，并证明了对抗性鲁棒性的不当改善。

Model attributions are important in deep neural networks as they aid practitioners in understanding the models, but recent studies reveal that attributions can be easily perturbed by adding imperceptible noise to the input. The non-differentiable Kendall's rank correlation is a key performance index for attribution protection. In this paper, we first show that the expected Kendall's rank correlation is positively correlated to cosine similarity and then indicate that the direction of attribution is the key to attribution robustness. Based on these findings, we explore the vector space of attribution to explain the shortcomings of attribution defense methods using $\ell_p$ norm and propose integrated gradient regularizer (IGR), which maximizes the cosine similarity between natural and perturbed attributions. Our analysis further exposes that IGR encourages neurons with the same activation states for natural samples and the corresponding perturbed samples, which is shown to induce robustness to gradient-based attribution methods. Our experiments on different models and datasets confirm our analysis on attribution protection and demonstrate a decent improvement in adversarial robustness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题