在功能归因解释中具有挑战的共同解释性假设

论文标题

在功能归因解释中具有挑战的共同解释性假设

Challenging common interpretability assumptions in feature attribution explanations

论文作者

Dinu, Jonathan, Bigham, Jeffrey, Kolter, J. Zico

论文摘要

随着机器学习和算法决策系统越来越多地在高风险的人类环境中利用，迫切需要了解其预测的基本原理。研究人员对可解释的AI（XAI）的需求做出了回应，但通常可以公理地宣布可以解释性，而无需评估。评估这些系统时，通常通过具有可解释性的替代指标（例如模型复杂性）进行离线模拟测试它们。我们通过大规模的人类受试者实验，通过简单的“安慰剂解释”控制来评估三个共同的可解释性假设的准确性。我们发现特征归因的解释为人类决策者提供了边际效用，并且在某些情况下，由于认知和上下文混杂因素而导致的决策更糟。这一结果挑战了采用这些方法所假定的普遍利益，我们希望这项工作将强调人类评估在XAI研究中的重要性。补充材料 - 包括实验的匿名数据，复制研究的代码，实验的交互式演示以及分析中使用的模型 - 可以在以下网址找到：https：///doi.pizza/challenging-xai。

As machine learning and algorithmic decision making systems are increasingly being leveraged in high-stakes human-in-the-loop settings, there is a pressing need to understand the rationale of their predictions. Researchers have responded to this need with explainable AI (XAI), but often proclaim interpretability axiomatically without evaluation. When these systems are evaluated, they are often tested through offline simulations with proxy metrics of interpretability (such as model complexity). We empirically evaluate the veracity of three common interpretability assumptions through a large scale human-subjects experiment with a simple "placebo explanation" control. We find that feature attribution explanations provide marginal utility in our task for a human decision maker and in certain cases result in worse decisions due to cognitive and contextual confounders. This result challenges the assumed universal benefit of applying these methods and we hope this work will underscore the importance of human evaluation in XAI research. Supplemental materials -- including anonymized data from the experiment, code to replicate the study, an interactive demo of the experiment, and the models used in the analysis -- can be found at: https://doi.pizza/challenging-xai.

下载PDF全文

下载文献需遵守相关版权规定

论文标题