论文标题
重新思考数据增强以进行强大的视觉问题回答
Rethinking Data Augmentation for Robust Visual Question Answering
论文作者
论文摘要
数据增强(DA) - 在原始培训集之外生成额外的培训样本 - 在当今无偏见的VQA模型中已被广泛使用,以减轻语言偏见。当前的主流DA策略是基于合成的方法,它通过编辑某些视觉区域/单词或从头开始重新生成它们来综合新样本。但是,这些合成样本始终是不自然的和错误的。为了避免此问题,最近的DA工作通过随机配对原始图像和其他人为编写的问题来构成新的增强样品。不幸的是,为了确保增强样品具有合理的基础答案,他们手动为几种问题类型设计了一套启发式规则,这极大地限制了其概括能力。为此,我们提出了一种新的基于知识蒸馏的数据增强,以称为Kddaug。具体而言,我们首先放松合理图像问题对的要求,可以轻松地应用于任何问题类型。然后,我们设计了一个基于知识蒸馏(KD)的答案分配,以生成所有组成的图像问题对的伪答案,这对内域和分布式设置都很健壮。由于Kddaug是一种模型的DA策略,因此可以将其无缝合并到任何VQA架构中。关于多个主链和基准测试的大量消融研究证明了Kddaug的有效性和概括能力。
Data Augmentation (DA) -- generating extra training samples beyond original training set -- has been widely-used in today's unbiased VQA models to mitigate the language biases. Current mainstream DA strategies are synthetic-based methods, which synthesize new samples by either editing some visual regions/words, or re-generating them from scratch. However, these synthetic samples are always unnatural and error-prone. To avoid this issue, a recent DA work composes new augmented samples by randomly pairing pristine images and other human-written questions. Unfortunately, to guarantee augmented samples have reasonable ground-truth answers, they manually design a set of heuristic rules for several question types, which extremely limits its generalization abilities. To this end, we propose a new Knowledge Distillation based Data Augmentation for VQA, dubbed KDDAug. Specifically, we first relax the requirements of reasonable image-question pairs, which can be easily applied to any question types. Then, we design a knowledge distillation (KD) based answer assignment to generate pseudo answers for all composed image-question pairs, which are robust to both in-domain and out-of-distribution settings. Since KDDAug is a model-agnostic DA strategy, it can be seamlessly incorporated into any VQA architectures. Extensive ablation studies on multiple backbones and benchmarks have demonstrated the effectiveness and generalization abilities of KDDAug.