关于视觉问题回答的多模式和互动解释的研究

论文标题

关于视觉问题回答的多模式和互动解释的研究

A Study on Multimodal and Interactive Explanations for Visual Question Answering

论文作者

Alipour, Kamran, Schulze, Jurgen P., Yao, Yi, Ziskind, Avi, Burachas, Giedrius

论文摘要

AI模型的解释性和解释性是影响AI安全性的重要因素。尽管各种可解释的AI（XAI）方法旨在减轻深层网络缺乏透明度，但这些方法在改善对AI系统的可用性，信任和理解方面的有效性仍然缺失。我们通过要求用户预测带有和没有解释的VQA代理的响应准确性来评估视觉问题回答（VQA）任务的多模式解释。在提高用户预测准确性，信心和依赖等方面，我们使用受试者间和受试者内部和受试者内实验来探测解释有效性。结果表明，这些解释有助于提高人类的预测准确性，尤其是在VQA系统的答案不准确时的试验中。此外，我们引入了积极的关注，这是一种通过编辑注意图来评估因果注意效应的新方法。用户解释评级与人类预测准确性密切相关，并提出这些解释在人机AI协作任务中的功效。

Explainability and interpretability of AI models is an essential factor affecting the safety of AI. While various explainable AI (XAI) approaches aim at mitigating the lack of transparency in deep networks, the evidence of the effectiveness of these approaches in improving usability, trust, and understanding of AI systems are still missing. We evaluate multimodal explanations in the setting of a Visual Question Answering (VQA) task, by asking users to predict the response accuracy of a VQA agent with and without explanations. We use between-subjects and within-subjects experiments to probe explanation effectiveness in terms of improving user prediction accuracy, confidence, and reliance, among other factors. The results indicate that the explanations help improve human prediction accuracy, especially in trials when the VQA system's answer is inaccurate. Furthermore, we introduce active attention, a novel method for evaluating causal attentional effects through intervention by editing attention maps. User explanation ratings are strongly correlated with human prediction accuracy and suggest the efficacy of these explanations in human-machine AI collaboration tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题