论文标题
GPT-3您是否需要在文化遗产中回答视觉问题?
Is GPT-3 all you need for Visual Question Answering in Cultural Heritage?
论文作者
论文摘要
在过去的几年中,在文化遗产领域中使用深度学习和计算机视觉在文化遗产领域变得非常相关,其中包括有关音频智能指南,互动博物馆和增强现实的大量应用。所有这些技术都需要大量数据才能有效工作并对用户有用。在艺术品的背景下,专家在昂贵且耗时的过程中注释了此类数据。特别是,对于每个艺术品,必须收集艺术品和描述表的图像,以执行常见任务,例如视觉问题回答。在本文中,我们提出了一种视觉问题回答的方法,该方法允许在运行时生成一个描述表,可用于回答有关艺术品的视觉和上下文问题,从而完全避免图像和注释过程。为此,我们研究了使用GPT-3来生成描述以通过字幕指标来分析生成描述质量的艺术品。最后,我们评估了视觉问题的性能回答和字幕任务。
The use of Deep Learning and Computer Vision in the Cultural Heritage domain is becoming highly relevant in the last few years with lots of applications about audio smart guides, interactive museums and augmented reality. All these technologies require lots of data to work effectively and be useful for the user. In the context of artworks, such data is annotated by experts in an expensive and time consuming process. In particular, for each artwork, an image of the artwork and a description sheet have to be collected in order to perform common tasks like Visual Question Answering. In this paper we propose a method for Visual Question Answering that allows to generate at runtime a description sheet that can be used for answering both visual and contextual questions about the artwork, avoiding completely the image and the annotation process. For this purpose, we investigate on the use of GPT-3 for generating descriptions for artworks analyzing the quality of generated descriptions through captioning metrics. Finally we evaluate the performance for Visual Question Answering and captioning tasks.