论文标题
统一的泰勒框架,用于重新访问归因方法
A Unified Taylor Framework for Revisiting Attribution Methods
论文作者
论文摘要
已经开发了归因方法,以了解机器学习模型的决策过程,尤其是深层神经网络,通过将重要性得分分配给单个特征。现有的归因方法通常基于经验直觉和启发式方法。仍然缺乏一般和理论框架,不仅可以统一这些归因方法,而且在理论上揭示了他们的理由,忠诚和局限性。为了弥合差距,在本文中,我们提出了一个泰勒归因框架,并将七个主流归因方法重新加密到框架中。基于重新制定,我们根据理由,忠诚和限制分析了归因方法。此外,我们在泰勒归因框架中建立了三个原则,即良好的归因,即低近似误差,正确的贡献分配和公正的基线选择。最后,我们从经验上验证了泰勒的重新纠正,并揭示了归因性能与通过对现实世界数据集进行基准测试的原理数量之间的正相关。
Attribution methods have been developed to understand the decision-making process of machine learning models, especially deep neural networks, by assigning importance scores to individual features. Existing attribution methods often built upon empirical intuitions and heuristics. There still lacks a general and theoretical framework that not only can unify these attribution methods, but also theoretically reveal their rationales, fidelity, and limitations. To bridge the gap, in this paper, we propose a Taylor attribution framework and reformulate seven mainstream attribution methods into the framework. Based on reformulations, we analyze the attribution methods in terms of rationale, fidelity, and limitation. Moreover, We establish three principles for a good attribution in the Taylor attribution framework, i.e., low approximation error, correct contribution assignment, and unbiased baseline selection. Finally, we empirically validate the Taylor reformulations and reveal a positive correlation between the attribution performance and the number of principles followed by the attribution method via benchmarking on real-world datasets.