论文标题

在视觉问题回答中产生理由

Generating Rationales in Visual Question Answering

论文作者

Ayyubi, Hammad A., Tanjim, Md. Mehrab, McAuley, Julian J., Cottrell, Garrison W.

论文摘要

尽管视觉问题播放(VQA)最近取得了进步,但仍然是一个挑战,这是一个挑战,可以将成功的推理和理解能力归因于多大的成功。从es-senty上讲,我们将其预测的答案任务为具有生成理由的VQA模型。从视觉常识性重新传播(VCR)任务中的数据,因为它包含地面真相以及视觉问题和and sewers。我们首先使用最先进的LAN-GUAGE模型GPT-2来生成原理,从而在领先的VCR Mod-els之一Vilbert中研究常识。接下来,我们寻求与GPT-2联合培训,以端对时尚,以预测VQA中的An-Swer并产生理由的双重任务。这种训练在VQA模型中注入commonsense的理解

Despite recent advances in Visual QuestionAnswering (VQA), it remains a challenge todetermine how much success can be attributedto sound reasoning and comprehension ability.We seek to investigate this question by propos-ing a new task ofrationale generation. Es-sentially, we task a VQA model with generat-ing rationales for the answers it predicts. Weuse data from the Visual Commonsense Rea-soning (VCR) task, as it contains ground-truthrationales along with visual questions and an-swers. We first investigate commonsense un-derstanding in one of the leading VCR mod-els, ViLBERT, by generating rationales frompretrained weights using a state-of-the-art lan-guage model, GPT-2. Next, we seek to jointlytrain ViLBERT with GPT-2 in an end-to-endfashion with the dual task of predicting the an-swer in VQA and generating rationales. Weshow that this kind of training injects com-monsense understanding in the VQA modelthrough quantitative and qualitative evaluationmetrics

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源