多线性抽样算法以估计沙普利值

论文标题

多线性抽样算法以估计沙普利值

A Multilinear Sampling Algorithm to Estimate Shapley Values

论文作者

Okhrati, Ramin, Lipani, Aldo

论文摘要

Shapley值是游戏理论中的出色分析工具，可以衡量玩家在游戏中的重要性。由于其具有效率等具有理想和理想的特性，因此在数据科学和机器学习中的功能重要性分析中，它们变得流行。但是，根据原始公式计算沙普利值的时间复杂性是指数的，随着特征的数量增加，这变得不可行。 Castro等。 [1]开发了一种采样算法，以估计莎普利值。在这项工作中，我们提出了一种基于游戏理论中应用的多线性扩展技术的新采样方法。目的是提供一种更有效的（采样）方法来估计沙普利值。我们的方法适用于任何机器学习模型，特别是用于多类分类或回归问题。我们将方法应用于估计多层感知器（MLP）的沙普利值，并通过在两个数据集上的实验中进行实验，我们证明我们的方法通过降低采样统计的方差提供了更准确的莎普利值估计。

Shapley values are great analytical tools in game theory to measure the importance of a player in a game. Due to their axiomatic and desirable properties such as efficiency, they have become popular for feature importance analysis in data science and machine learning. However, the time complexity to compute Shapley values based on the original formula is exponential, and as the number of features increases, this becomes infeasible. Castro et al. [1] developed a sampling algorithm, to estimate Shapley values. In this work, we propose a new sampling method based on a multilinear extension technique as applied in game theory. The aim is to provide a more efficient (sampling) method for estimating Shapley values. Our method is applicable to any machine learning model, in particular for either multi-class classifications or regression problems. We apply the method to estimate Shapley values for multilayer perceptrons (MLPs) and through experimentation on two datasets, we demonstrate that our method provides more accurate estimations of the Shapley values by reducing the variance of the sampling statistics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题