论文标题
KRISP:基于开放域知识的VQA集成隐式和象征性知识
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
论文作者
论文摘要
VQA中最具挑战性的问题类型之一是回答问题时不存在图像中不存在的知识。在这项工作中,我们研究了开放域知识,当不给出问题所需的知识时,既不在培训或测试时间都没有给出问题的设置。我们利用两种类型的知识表示和推理。首先,可以通过基于变压器的模型从无监督的语言预训练和监督培训数据中有效地学习的隐性知识。其次,在知识库中编码的明确,象征知识。我们的方法结合了两者 - 利用变压器模型的强大隐式推理进行答案预测,并从知识图中整合符号表示,同时从未将其显式语义丢失到隐式嵌入。我们结合了各种知识来源,以涵盖解决基于知识的问题所需的广泛知识。我们展示了我们的方法KRISP(具有隐式和符号表示形式的知识推理),在OK-VQA上的最先进是最大的基于开放域知识VQA的最大可用数据集。我们通过广泛的消融表明,尽管我们的模型成功利用了隐性知识推理,但符号答案模块将知识图明确连接到答案词汇对于我们的方法的性能并将其推广到罕见答案至关重要。
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image. In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time. We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models. Second, explicit, symbolic knowledge encoded in knowledge bases. Our approach combines both - exploiting the powerful implicit reasoning of transformer models for answer prediction, and integrating symbolic representations from a knowledge graph, while never losing their explicit semantics to an implicit embedding. We combine diverse sources of knowledge to cover the wide variety of knowledge needed to solve knowledge-based questions. We show our approach, KRISP (Knowledge Reasoning with Implicit and Symbolic rePresentations), significantly outperforms state-of-the-art on OK-VQA, the largest available dataset for open-domain knowledge-based VQA. We show with extensive ablations that while our model successfully exploits implicit knowledge reasoning, the symbolic answer module which explicitly connects the knowledge graph to the answer vocabulary is critical to the performance of our method and generalizes to rare answers.