论文标题

要找到沃尔多

To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo

论文作者

Luo, Yiran, Banerjee, Pratyay, Gokhale, Tejas, Yang, Yezhou, Baral, Chitta

论文摘要

我们提出了一个以人为中心的视觉接地(PCVG)任务的依据数据集,Cui等人首先提出。 (2021)在WHO的Waldo数据集中。鉴于图像和标题,PCVG需要将字幕中提到的一个人的名字与指向图像中的人指向的人的名字配对。我们发现,为此任务编辑的原始Who的Waldo数据集包含大量仅通过启发式方法解决的偏见样本。例如,在许多情况下,句子中的名字对应于最大的边界框,或句子中的名称序列对应于图像中的确切左右顺序。自然,对这些有偏见的数据训练的模型导致在基准上过度估计性能。为了使模型出于正确的原因是正确的,我们通过排除所有不足的上下文示例(例如没有动词的示例,或在其字幕中具有长长的连词链链)来设计自动化工具来过滤和DEBIAS来过滤和DEBIAS。我们的实验表明,我们新的子采样数据集包含较小的偏差,并且启发式性能和启发式方法和监督方法之间的差距大大降低。我们还展示了对我们的依据训练集训练的相同基准模型,胜过对我们的偏见测试集的原始偏见(且更大的)培训的培训。我们认为,我们的DEBIAS数据集为PCVG任务提供了更实用的基准,用于可靠的基准测试和未来改进。

We present a debiased dataset for the Person-centric Visual Grounding (PCVG) task first proposed by Cui et al. (2021) in the Who's Waldo dataset. Given an image and a caption, PCVG requires pairing up a person's name mentioned in a caption with a bounding box that points to the person in the image. We find that the original Who's Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image. Naturally, models trained on these biased data lead to over-estimation of performance on the benchmark. To enforce models being correct for the correct reasons, we design automated tools to filter and debias the original dataset by ruling out all examples of insufficient context, such as those with no verb or with a long chain of conjunct names in their captions. Our experiments show that our new sub-sampled dataset contains less bias with much lowered heuristic performances and widened gaps between heuristic and supervised methods. We also demonstrate the same benchmark model trained on our debiased training set outperforms that trained on the original biased (and larger) training set on our debiased test set. We argue our debiased dataset offers the PCVG task a more practical baseline for reliable benchmarking and future improvements.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源