可解释性评估指标的解决性

论文标题

可解释性评估指标的解决性

The Solvability of Interpretability Evaluation Metrics

论文作者

Zhou, Yilun, Shah, Julie

论文摘要

特征归因方法很受欢迎，可以解释神经网络预测，并且通常对诸如综合性和充分性等指标进行评估。在本文中，我们重点介绍了这些指标的有趣属性：它们的解决性。具体而言，我们可以定义优化对度量的解释的问题，可以通过梁搜索来解决。这一观察结果导致一个明显但没有解决的问题：如果度量值代表解释质量，为什么我们使用解释器（例如石灰）不是基于解决目标度量的问题？我们提出了一系列的调查，显示了该梁搜索解释器的强劲表现，并讨论了其更广泛的含义：可解释性概念的定义评估双重性。我们实现了解释器，并发布了文本，图像和表格域模型的Python Solvex软件包。

Feature attribution methods are popular for explaining neural network predictions, and they are often evaluated on metrics such as comprehensiveness and sufficiency. In this paper, we highlight an intriguing property of these metrics: their solvability. Concretely, we can define the problem of optimizing an explanation for a metric, which can be solved by beam search. This observation leads to the obvious yet unaddressed question: why do we use explainers (e.g., LIME) not based on solving the target metric, if the metric value represents explanation quality? We present a series of investigations showing strong performance of this beam search explainer and discuss its broader implication: a definition-evaluation duality of interpretability concepts. We implement the explainer and release the Python solvex package for models of text, image and tabular domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题