论文标题
通过因果分析基于概念的解释
Debiasing Concept-based Explanations with Causal Analysis
论文作者
论文摘要
基于概念的解释方法是一种流行的模型插入性工具,因为它表达了模型在对领域专家有意义的概念方面的预测原因。在这项工作中,我们研究了与功能中的混杂信息相关的概念的问题。我们提出了一个新的因果关系图,用于建模未观察到的变量的影响,并使用从仪器变量文献中借用的两阶段回归技术来消除混杂信息和噪声的影响。我们还对概念设定的完整性进行了建模,并表明当概念不完整时,我们的偏见方法有效。我们的合成和现实世界实验证明了我们方法在消除偏见和改善概念对预测的解释方面的成功。
Concept-based explanation approach is a popular model interpertability tool because it expresses the reasons for a model's predictions in terms of concepts that are meaningful for the domain experts. In this work, we study the problem of the concepts being correlated with confounding information in the features. We propose a new causal prior graph for modeling the impacts of unobserved variables and a method to remove the impact of confounding information and noise using a two-stage regression technique borrowed from the instrumental variable literature. We also model the completeness of the concepts set and show that our debiasing method works when the concepts are not complete. Our synthetic and real-world experiments demonstrate the success of our method in removing biases and improving the ranking of the concepts in terms of their contribution to the explanation of the predictions.