评估多模式接地中跨数据库变化的问题生成

论文标题

评估多模式接地中跨数据库变化的问题生成

Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding

论文作者

Akula, Arjun R.

论文摘要

视觉问题回答（VQA）是回答有关输入图像的自然语言问题的多模式任务。通过跨数据库改编方法，可以将知识从具有较大火车样本的源数据集传输到训练集有限的目标数据集。假设在一个数据集火车集中训练的VQA模型无法适应另一个数据集，因此很难确定域失配的根本原因，因为可能存在多种原因，例如图像分布不匹配和问题分布不匹配。在UCLA，我们正在开发一个VQG模块，该模块有助于自动生成OOD偏移，有助于系统地评估VQA模型的跨数据库适应能力。

Visual question answering (VQA) is the multi-modal task of answering natural language questions about an input image. Through cross-dataset adaptation methods, it is possible to transfer knowledge from a source dataset with larger train samples to a target dataset where training set is limited. Suppose a VQA model trained on one dataset train set fails in adapting to another, it is hard to identify the underlying cause of domain mismatch as there could exists a multitude of reasons such as image distribution mismatch and question distribution mismatch. At UCLA, we are working on a VQG module that facilitate in automatically generating OOD shifts that aid in systematically evaluating cross-dataset adaptation capabilities of VQA models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题