论文标题
通过归因指导分解可视化受监督和自我监督的神经网络
Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization
论文作者
论文摘要
神经网络可视化技术通过与网络的分类相关标记图像位置。现有方法可有效强调影响最大分类的区域。但是,如我们所示,这些方法在识别对替代分类的支持的能力上受到限制,我们将{\ em显着性偏见}假设命名。在这项工作中,我们整合了两条研究线:基于梯度的方法和基于归因的方法,并开发了一种提供每类解释性的算法。算法向后项目以局部归因为指导的方式,同时校正了显着特征,以否则会偏向解释。在大量的实验中,我们证明了方法具有特定于类的可视化的能力,而不仅仅是预测的标签。值得注意的是,该方法以基准以及通常用于基于梯度的方法以及主要用于评估归因方法的方法的基准而获得了最新的结果。使用新的无监督程序,我们的方法也成功地证明了自我监督的方法学习语义信息。
Neural network visualization techniques mark image locations by their relevancy to the network's classification. Existing methods are effective in highlighting the regions that affect the resulting classification the most. However, as we show, these methods are limited in their ability to identify the support for alternative classifications, an effect we name {\em the saliency bias} hypothesis. In this work, we integrate two lines of research: gradient-based methods and attribution-based methods, and develop an algorithm that provides per-class explainability. The algorithm back-projects the per pixel local influence, in a manner that is guided by the local attributions, while correcting for salient features that would otherwise bias the explanation. In an extensive battery of experiments, we demonstrate the ability of our methods to class-specific visualization, and not just the predicted label. Remarkably, the method obtains state of the art results in benchmarks that are commonly applied to gradient-based methods as well as in those that are employed mostly for evaluating attribution methods. Using a new unsupervised procedure, our method is also successful in demonstrating that self-supervised methods learn semantic information.