通过近似当地决策边界来解释预测

论文标题

通过近似当地决策边界来解释预测

Explaining Predictions by Approximating the Local Decision Boundary

论文作者

Vlassopoulos, Georgios, van Erven, Tim, Brighton, Henry, Menkovski, Vlado

论文摘要

为不透明的机器学习模型构建准确的模型 - 不合SNOSTIC解释仍然是一项艰巨的任务。高维数据（例如图像）的分类模型通常本质上是复杂的。为了降低这种复杂性，可以通过更简单的局部替代模型或通过传达预测与其他类别的形成鲜明对比的方式来在本地解释个人预测。但是，现有的方法仍然以以下方式缺乏：a）它们使用（欧几里得）度量标准测量该位置，这对于非线性高维数据没有意义；或b）他们没有试图解释决策边界，这是针对分类精度进行优化的分类器的最相关特征；或c）他们没有为用户提供任何自由来指定对他们有意义的属性。我们在本地决策边界近似（DBA）的新程序中解决了这些问题。为了构建有意义的指标，我们训练一个变异自动编码器，以学习编码数据表示的欧几里得潜在空间。我们通过将属性注释映射到对用户有意义的属性来绘制属性注释来施加解释性。评估解释性方法的困难是缺乏基础真理。我们通过引入具有人工生成的虹膜图像的新基准数据集来解决此问题，并表明我们可以恢复当地确定类的潜在属性。我们进一步评估了表格数据和Celeba图像数据集的方法。

Constructing accurate model-agnostic explanations for opaque machine learning models remains a challenging task. Classification models for high-dimensional data, like images, are often inherently complex. To reduce this complexity, individual predictions may be explained locally, either in terms of a simpler local surrogate model or by communicating how the predictions contrast with those of another class. However, existing approaches still fall short in the following ways: a) they measure locality using a (Euclidean) metric that is not meaningful for non-linear high-dimensional data; or b) they do not attempt to explain the decision boundary, which is the most relevant characteristic of classifiers that are optimized for classification accuracy; or c) they do not give the user any freedom in specifying attributes that are meaningful to them. We address these issues in a new procedure for local decision boundary approximation (DBA). To construct a meaningful metric, we train a variational autoencoder to learn a Euclidean latent space of encoded data representations. We impose interpretability by exploiting attribute annotations to map the latent space to attributes that are meaningful to the user. A difficulty in evaluating explainability approaches is the lack of a ground truth. We address this by introducing a new benchmark data set with artificially generated Iris images, and showing that we can recover the latent attributes that locally determine the class. We further evaluate our approach on tabular data and on the CelebA image data set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题