通过在线词汇扩展将图像用新颖对象字幕

论文标题

通过在线词汇扩展将图像用新颖对象字幕

Captioning Images with Novel Objects via Online Vocabulary Expansion

论文作者

Tanaka, Mikihiro, Harada, Tatsuya

论文摘要

在这项研究中，我们引入了一种低成本方法，用于从包含新物体的图像中产生描述。通常，构建一个可以用新物体来解释图像的模型的代价很高，因为以下内容：（1）为每个类别收集大量数据，以及（2）重新训练整个系统。如果人类看到少量的新物体，他们可以通过将外观与已知物体相关联来估计其性质。因此，我们提出了一种可以使用新颖对象解释图像的方法，而无需使用仅从对象的少数图像特征估算的对象的单词嵌入。该方法可以与一般图像捕获模型集成。实验结果表明了我们方法的有效性。

In this study, we introduce a low cost method for generating descriptions from images containing novel objects. Generally, constructing a model, which can explain images with novel objects, is costly because of the following: (1) collecting a large amount of data for each category, and (2) retraining the entire system. If humans see a small number of novel objects, they are able to estimate their properties by associating their appearance with known objects. Accordingly, we propose a method that can explain images with novel objects without retraining using the word embeddings of the objects estimated from only a small number of image features of the objects. The method can be integrated with general image-captioning models. The experimental results show the effectiveness of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题