是什么给答案了？问题回答视频质量检查数据集的偏见分析

论文标题

是什么给答案了？问题回答视频质量检查数据集的偏见分析

What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets

论文作者

Yang, Jianing, Zhu, Yuying, Wang, Yongxin, Yi, Ruitao, Zadeh, Amir, Morency, Louis-Philippe

论文摘要

问题回答视频质量检查数据集中的偏见可能会误导多模型模型，以使质量检查量，并危害该模型的概括能力。了解这些质量检查偏见的强大以及它们来自何处有助于社区衡量进展，并为研究人员提供洞察力来调试其模型。在本文中，我们在流行的视频问题中分析了质量检查的偏见回答数据集，并发现预验证的语言模型可以在不使用任何多模式上下文信息的情况下正确回答37-48％的问题，远远超过了5- choose-1多选择性问题的20％随机猜测基线。我们的消融研究表明，偏见可能来自注释者和问题类型。具体而言，在训练过程中看到的注释者可以通过模型和推理更好地预测，抽象问题比事实直接问题产生的偏见更多。我们还从经验上表明，使用注释者 - 非重叠的火车测试拆分可以减少视频质量检查数据集的质量质量偏差。

Question answering biases in video QA datasets can mislead multimodal model to overfit to QA artifacts and jeopardize the model's ability to generalize. Understanding how strong these QA biases are and where they come from helps the community measure progress more accurately and provide researchers insights to debug their models. In this paper, we analyze QA biases in popular video question answering datasets and discover pretrained language models can answer 37-48% questions correctly without using any multimodal context information, far exceeding the 20% random guess baseline for 5-choose-1 multiple-choice questions. Our ablation study shows biases can come from annotators and type of questions. Specifically, annotators that have been seen during training are better predicted by the model and reasoning, abstract questions incur more biases than factual, direct questions. We also show empirically that using annotator-non-overlapping train-test splits can reduce QA biases for video QA datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题