你看到什么？通过神经后门评估可解释的人工智能（XAI）

论文标题

你看到什么？通过神经后门评估可解释的人工智能（XAI）

What Do You See? Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors

论文作者

Lin, Yi-Shan, Lee, Wen-Chuan, Celik, Z. Berkay

论文摘要

已经提出了可解释的AI（XAI）方法来解释深层神经网络如何通过模型显着性解释来预测输入，这些解释突出了被认为对在特定目标上做出决定很重要的输入部分。但是，量化其可解释性的正确性仍然具有挑战性，因为当前的评估方法需要人类的主观输入或通过自动评估产生高计算成本。在本文中，我们提出了后门触发模式 - 导致错误分类的恶意功能 - 自动化显着性解释的评估。我们的主要观察结果是，触发器为输入提供了基础真理，以评估XAI方法确定的区域是否与其输出真正相关。由于后门触发器是导致故意错误分类的最重要特征，因此强大的XAI方法应揭示其在推理时间的存在。我们介绍了三个互补指标，以系统评估XAI方法通过36型模型，并使用颜色，形状，纹理，纹理，位置和大小来生成并评估七个最先进的模型和特定于模型的popthoc方法。我们发现了六种使用本地解释和功能相关性无法完全突出触发区域的方法，只有一种无模型的方法才能发现整个触发区域。

EXplainable AI (XAI) methods have been proposed to interpret how a deep neural network predicts inputs through model saliency explanations that highlight the parts of the inputs deemed important to arrive a decision at a specific target. However, it remains challenging to quantify correctness of their interpretability as current evaluation approaches either require subjective input from humans or incur high computation cost with automated evaluation. In this paper, we propose backdoor trigger patterns--hidden malicious functionalities that cause misclassification--to automate the evaluation of saliency explanations. Our key observation is that triggers provide ground truth for inputs to evaluate whether the regions identified by an XAI method are truly relevant to its output. Since backdoor triggers are the most important features that cause deliberate misclassification, a robust XAI method should reveal their presence at inference time. We introduce three complementary metrics for systematic evaluation of explanations that an XAI method generates and evaluate seven state-of-the-art model-free and model-specific posthoc methods through 36 models trojaned with specifically crafted triggers using color, shape, texture, location, and size. We discovered six methods that use local explanation and feature relevance fail to completely highlight trigger regions, and only a model-free approach can uncover the entire trigger region.

下载PDF全文

下载文献需遵守相关版权规定

论文标题