论文标题
评估和改善可自我解释的零件型网络的可解释性
Evaluation and Improvement of Interpretability for Self-Explainable Part-Prototype Networks
论文作者
论文摘要
零件 - 主要型网络(例如Protopnet,Prototree和Protopool)吸引了广泛的研究兴趣,因为它们的内在解释性和与非解释性对应物的精度相当的精度。但是,最近的作品发现,由于特征空间中的相似性与输入空间中的相似性之间的语义差距,原型的可解释性是脆弱的。在这项工作中,我们努力通过第一次尝试定量和客观地评估部分 - 概况网络的解释性来应对这一挑战。具体而言,我们提出了两个评估指标,称为一致性得分和稳定得分,以评估图像之间的解释一致性和分别针对扰动的解释鲁棒性,这两者对于实践中的解释至关重要。此外,我们提出了一个具有浅深度特征比对(SDFA)模块和分数聚合(SA)模块的详细零件 - 型网络,以提高原型的可解释性。我们进行系统的评估实验,并提供实质性的讨论,以揭示现有零件型网络的解释性。对九个架构的三个基准测试的实验表明,我们的模型在准确性和可解释性方面都能达到与最新状态相比的表现。我们的代码可在https://github.com/hqhqaq/evalprotopnet上找到。
Part-prototype networks (e.g., ProtoPNet, ProtoTree, and ProtoPool) have attracted broad research interest for their intrinsic interpretability and comparable accuracy to non-interpretable counterparts. However, recent works find that the interpretability from prototypes is fragile, due to the semantic gap between the similarities in the feature space and that in the input space. In this work, we strive to address this challenge by making the first attempt to quantitatively and objectively evaluate the interpretability of the part-prototype networks. Specifically, we propose two evaluation metrics, termed as consistency score and stability score, to evaluate the explanation consistency across images and the explanation robustness against perturbations, respectively, both of which are essential for explanations taken into practice. Furthermore, we propose an elaborated part-prototype network with a shallow-deep feature alignment (SDFA) module and a score aggregation (SA) module to improve the interpretability of prototypes. We conduct systematical evaluation experiments and provide substantial discussions to uncover the interpretability of existing part-prototype networks. Experiments on three benchmarks across nine architectures demonstrate that our model achieves significantly superior performance to the state of the art, in both the accuracy and interpretability. Our code is available at https://github.com/hqhQAQ/EvalProtoPNet.