论文标题
基于声明的及时调整视觉问题回答
Declaration-based Prompt Tuning for Visual Question Answering
论文作者
论文摘要
近年来,预训练 - 然后进行调整的范式在各种跨模式的任务上取得了巨大的成功,例如视觉质疑答案(VQA),其中最初通过自我操纵的目标对象进行了视觉语言(VL)模型,然后通过自我实现的任务对象进行优化下游任务(例如VQA)通过全新的目标函数,例如答案预测。目标形式的不一致不仅严重限制了预训练的VL模型在下游任务中的概括,而且还需要大量的标记数据进行微调。为了减轻问题,我们提出了一种创新的VL微调范式(称为基于声明的及时调整,缩写为DPT),该范式共同优化了VQA模型的预培训和微调的目标,从而提高了预先训练的VL模型对下游任务的有效适应。具体而言,DPT通过(1)文本适应重新重新设计了VQA任务的客观形式,该文本适应将给定的问题转换为声明性句子形式以进行及时调整,以及(2)任务适应,该任务适应以预先培训阶段的方式优化了VQA问题的客观功能。 GQA数据集的实验结果表明,DPT在完全监督的(2.68%)和零射击/少数拍摄(超过31%)的设置方面的准确性都超过了微调的对应。所有数据和代码都将用于促进未来的研究。
In recent years, the pre-training-then-fine-tuning paradigm has yielded immense success on a wide spectrum of cross-modal tasks, such as visual question answering (VQA), in which a visual-language (VL) model is first optimized via self-supervised task objectives, e.g., masked language modeling (MLM) and image-text matching (ITM), and then fine-tuned to adapt to downstream task (e.g., VQA) via a brand-new objective function, e.g., answer prediction. The inconsistency of the objective forms not only severely limits the generalization of pre-trained VL models to downstream tasks, but also requires a large amount of labeled data for fine-tuning. To alleviate the problem, we propose an innovative VL fine-tuning paradigm (named Declaration-based Prompt Tuning, abbreviated as DPT), which jointly optimizes the objectives of pre-training and fine-tuning of VQA model, boosting the effective adaptation of pre-trained VL models to the downstream task. Specifically, DPT reformulates the objective form of VQA task via (1) textual adaptation, which converts the given questions into declarative sentence-form for prompt-tuning, and (2) task adaptation, which optimizes the objective function of VQA problem in the manner of pre-training phase. Experimental results on GQA dataset show that DPT outperforms the fine-tuned counterpart by a large margin regarding accuracy in both fully-supervised (2.68%) and zero-shot/few-shot (over 31%) settings. All the data and codes will be available to facilitate future research.