论文标题
评估多模式接地中跨数据库变化的问题生成
Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding
论文作者
论文摘要
视觉问题回答(VQA)是回答有关输入图像的自然语言问题的多模式任务。通过跨数据库改编方法,可以将知识从具有较大火车样本的源数据集传输到训练集有限的目标数据集。假设在一个数据集火车集中训练的VQA模型无法适应另一个数据集,因此很难确定域失配的根本原因,因为可能存在多种原因,例如图像分布不匹配和问题分布不匹配。在UCLA,我们正在开发一个VQG模块,该模块有助于自动生成OOD偏移,有助于系统地评估VQA模型的跨数据库适应能力。
Visual question answering (VQA) is the multi-modal task of answering natural language questions about an input image. Through cross-dataset adaptation methods, it is possible to transfer knowledge from a source dataset with larger train samples to a target dataset where training set is limited. Suppose a VQA model trained on one dataset train set fails in adapting to another, it is hard to identify the underlying cause of domain mismatch as there could exists a multitude of reasons such as image distribution mismatch and question distribution mismatch. At UCLA, we are working on a VQG module that facilitate in automatically generating OOD shifts that aid in systematically evaluating cross-dataset adaptation capabilities of VQA models.