论文标题
通过在线词汇扩展将图像用新颖对象字幕
Captioning Images with Novel Objects via Online Vocabulary Expansion
论文作者
论文摘要
在这项研究中,我们引入了一种低成本方法,用于从包含新物体的图像中产生描述。通常,构建一个可以用新物体来解释图像的模型的代价很高,因为以下内容:(1)为每个类别收集大量数据,以及(2)重新训练整个系统。如果人类看到少量的新物体,他们可以通过将外观与已知物体相关联来估计其性质。因此,我们提出了一种可以使用新颖对象解释图像的方法,而无需使用仅从对象的少数图像特征估算的对象的单词嵌入。该方法可以与一般图像捕获模型集成。实验结果表明了我们方法的有效性。
In this study, we introduce a low cost method for generating descriptions from images containing novel objects. Generally, constructing a model, which can explain images with novel objects, is costly because of the following: (1) collecting a large amount of data for each category, and (2) retraining the entire system. If humans see a small number of novel objects, they are able to estimate their properties by associating their appearance with known objects. Accordingly, we propose a method that can explain images with novel objects without retraining using the word embeddings of the objects estimated from only a small number of image features of the objects. The method can be integrated with general image-captioning models. The experimental results show the effectiveness of our approach.