视觉问题的数据集和基准回答艺术

论文标题

视觉问题的数据集和基准回答艺术

A Dataset and Baselines for Visual Question Answering on Art

论文作者

Garcia, Noa, Ye, Chentao, Liu, Zihua, Hu, Qingtao, Otani, Mayu, Chu, Chenhui, Nakashima, Yuta, Mitamura, Teruko

论文摘要

回答与艺术作品（绘画）有关的问题是一项艰巨的任务，因为它不仅意味着对图片中所示的视觉信息的理解，还意味着通过研究艺术史获得的上下文知识。在这项工作中，我们介绍了建立新数据集的首次尝试（艺术问答）。问题解答（QA）对是使用基于现有艺术理解数据集中提供的绘画和评论的最先进的问题生成方法自动生成的。 QA对通过众包工人就语法正确性，答案和答案的正确性进行清洁。我们的数据集固有地由视觉（基于绘画）和知识（基于评论）的问题组成。我们还提出了一个双分支模型作为基线，在其中独立处理视觉和知识问题。我们将基线模型与最新的回答模型进行了广泛的比较，并提供了一项有关视觉问题回答艺术的挑战和未来方向的全面研究。

Answering questions related to art pieces (paintings) is a difficult task, as it implies the understanding of not only the visual information that is shown in the picture, but also the contextual knowledge that is acquired through the study of the history of art. In this work, we introduce our first attempt towards building a new dataset, coined AQUA (Art QUestion Answering). The question-answer (QA) pairs are automatically generated using state-of-the-art question generation methods based on paintings and comments provided in an existing art understanding dataset. The QA pairs are cleansed by crowdsourcing workers with respect to their grammatical correctness, answerability, and answers' correctness. Our dataset inherently consists of visual (painting-based) and knowledge (comment-based) questions. We also present a two-branch model as baseline, where the visual and knowledge questions are handled independently. We extensively compare our baseline model against the state-of-the-art models for question answering, and we provide a comprehensive study about the challenges and potential future directions for visual question answering on art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题