论文标题
用CLEVR-XAI对神经网络解释的地面真相评估
Ground Truth Evaluation of Neural Network Explanations with CLEVR-XAI
论文作者
论文摘要
深度学习在当今的应用程序中的兴起需要越来越需要解释模型的决策,而不是预测表现,以促进信任和问责制。最近,可解释的AI(XAI)领域开发了为已经训练的神经网络提供此类解释的方法。在计算机视觉任务中,这些解释,称为热图,可视化单个像素对预测的贡献。到目前为止,XAI方法及其热图主要通过基于人类的评估进行定性验证,或通过辅助代理任务(例如像素扰动,弱物体定位或随机测试)进行评估。由于缺乏对热图的客观且普遍接受的质量度量,XAI方法的表现最佳以及是否完全可以信任解释是有争议的。在目前的工作中,我们通过基于CLEVR视觉问题答案任务提出基于基于真理的评估框架来解决问题。我们的框架提供了(1)选择性,(2)受控和(3)对神经网络解释评估的现实测试床。我们比较了十种不同的解释方法,从而产生了有关XAI方法质量和特性的新见解,有时与以前的比较研究的结论相矛盾。可以在https://github.com/ahmedmagdiosman/clevr-xai上找到CLEVR-XAI数据集和基准代码。
The rise of deep learning in today's applications entailed an increasing need in explaining the model's decisions beyond prediction performances in order to foster trust and accountability. Recently, the field of explainable AI (XAI) has developed methods that provide such explanations for already trained neural networks. In computer vision tasks such explanations, termed heatmaps, visualize the contributions of individual pixels to the prediction. So far XAI methods along with their heatmaps were mainly validated qualitatively via human-based assessment, or evaluated through auxiliary proxy tasks such as pixel perturbation, weak object localization or randomization tests. Due to the lack of an objective and commonly accepted quality measure for heatmaps, it was debatable which XAI method performs best and whether explanations can be trusted at all. In the present work, we tackle the problem by proposing a ground truth based evaluation framework for XAI methods based on the CLEVR visual question answering task. Our framework provides a (1) selective, (2) controlled and (3) realistic testbed for the evaluation of neural network explanations. We compare ten different explanation methods, resulting in new insights about the quality and properties of XAI methods, sometimes contradicting with conclusions from previous comparative studies. The CLEVR-XAI dataset and the benchmarking code can be found at https://github.com/ahmedmagdiosman/clevr-xai.