对综合梯度方法的严格研究和内部神经元归因的扩展

论文标题

对综合梯度方法的严格研究和内部神经元归因的扩展

A Rigorous Study of Integrated Gradients Method and Extensions to Internal Neuron Attributions

论文作者

Lundstrom, Daniel, Huang, Tianjian, Razaviyayn, Meisam

论文摘要

随着深度学习（DL）功效的增长，对差模型的关注也会增加。归因方法通过量化输入功能对模型预测的重要性来解决解释性问题。在各种方法中，综合梯度（IG）通过声称其他方法无法满足理想的公理，而IG和类似的方法则独特地满足了公理，则可以使自己分开。本文评论了IG及其应用/扩展的基本方面：1）我们确定IG函数空间与支持文献的功能空间之间的关键差异，这些函数空间使IG唯一性的先前主张问题问题。我们表明，通过引入附加的公理，\ textit {decreling astoritivity}，可以建立唯一性主张。 2）我们通过识别属性输入中IG是/不是Lipschitz的函数类来解决输入灵敏度的问题。 3）我们表明，单基线方法的公理具有具有概率分布基准的方法的类似特性。 4）我们引入了一种识别有助于IG归因图的指定区域的内部神经元的计算有效方法。最后，我们提出了验证此方法的实验结果。

As deep learning (DL) efficacy grows, concerns for poor model explainability grow also. Attribution methods address the issue of explainability by quantifying the importance of an input feature for a model prediction. Among various methods, Integrated Gradients (IG) sets itself apart by claiming other methods failed to satisfy desirable axioms, while IG and methods like it uniquely satisfy said axioms. This paper comments on fundamental aspects of IG and its applications/extensions: 1) We identify key differences between IG function spaces and the supporting literature's function spaces which problematize previous claims of IG uniqueness. We show that with the introduction of an additional axiom, \textit{non-decreasing positivity}, the uniqueness claims can be established. 2) We address the question of input sensitivity by identifying function classes where IG is/is not Lipschitz in the attributed input. 3) We show that axioms for single-baseline methods have analogous properties for methods with probability distribution baselines. 4) We introduce a computationally efficient method of identifying internal neurons that contribute to specified regions of an IG attribution map. Finally, we present experimental results validating this method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题