论文标题
分类VQA模型:对比度梯度学习,以提高一致性
SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency
论文作者
论文摘要
视觉问题回答(VQA)的最新研究表明,最先进的模型在对世界的理解中是不一致的 - 他们回答了看似棘手的问题,需要正确的推理,但错误的相关子问题是错误的。这些子问题与图像中的较低级别的视觉概念有关,该模型理想地应该理解,以便能够正确回答更高级别的问题。为了解决这个问题,我们首先提出了一种基于梯度的可解释性方法,以确定与图像上的推理问题最密切相关的问题,并使用它来评估VQA模型,以确定其确定回答推理问题所需的相关子问题的能力。接下来,我们提出了一种基于梯度学习的方法,称为子问题,以子问题为导向的调整(排序),该方法鼓励模型对相关的子问题对<image,Chinceming-Question-Question>对的相关子问题进行排名。我们表明,这类排序将模型一致性提高了6.5%的积分,同时也改善了视觉接地。
Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer seemingly difficult questions requiring reasoning correctly but get simpler associated sub-questions wrong. These sub-questions pertain to lower level visual concepts in the image that models ideally should understand to be able to answer the higher level question correctly. To address this, we first present a gradient-based interpretability approach to determine the questions most strongly correlated with the reasoning question on an image, and use this to evaluate VQA models on their ability to identify the relevant sub-questions needed to answer a reasoning question. Next, we propose a contrastive gradient learning based approach called Sub-question Oriented Tuning (SOrT) which encourages models to rank relevant sub-questions higher than irrelevant questions for an <image, reasoning-question> pair. We show that SOrT improves model consistency by upto 6.5% points over existing baselines, while also improving visual grounding.