当赌注很高时：平衡准确性和透明度与模型不可解释的数据驱动的替代物

论文标题

当赌注很高时：平衡准确性和透明度与模型不可解释的数据驱动的替代物

When stakes are high: balancing accuracy and transparency with Model-Agnostic Interpretable Data-driven suRRogates

论文作者

Henckaerts, Roel, Antonio, Katrien, Côté, Marie-Pier

论文摘要

高度监管的行业，例如银行和保险，要求采用透明的决策算法。同时，竞争市场正在推动使用复杂的黑匣子模型。因此，我们提出了一个程序，以开发适合结构化表格数据的模型不可解释的数据驱动的替代（MAIDRR）。知识是通过部分依赖效应从黑匣子中提取的。这些用于通过对变量值进行分组来执行智能功能工程。这导致具有自动变量选择的特征空间的分割。透明的广义线性模型（GLM）适合分类格式的特征及其相关相互作用。我们通过一项有关六个公开可用数据集的一般保险索赔频率建模的案例研究，证明了我们的R软件包MAIDRR。我们的MAIDRR GLM非常接近梯度提升机（GBM）黑匣子，并且以基准为基准的线性和树木代理。

Highly regulated industries, like banking and insurance, ask for transparent decision-making algorithms. At the same time, competitive markets are pushing for the use of complex black box models. We therefore present a procedure to develop a Model-Agnostic Interpretable Data-driven suRRogate (maidrr) suited for structured tabular data. Knowledge is extracted from a black box via partial dependence effects. These are used to perform smart feature engineering by grouping variable values. This results in a segmentation of the feature space with automatic variable selection. A transparent generalized linear model (GLM) is fit to the features in categorical format and their relevant interactions. We demonstrate our R package maidrr with a case study on general insurance claim frequency modeling for six publicly available datasets. Our maidrr GLM closely approximates a gradient boosting machine (GBM) black box and outperforms both a linear and tree surrogate as benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题