论文标题
图像字幕引起了使用EdgitionNet的智能本地旅游的关注
Image Captioning with Attention for Smart Local Tourism using EfficientNet
论文作者
论文摘要
智能系统已经大规模开发,以帮助人类完成各种任务。由于数据湖的爆炸爆炸,深度学习技术在创建准确的助理系统方面进一步推动。智能系统任务之一是传播用户所需的信息。这对于促进当地旅游目的地的旅游业至关重要。在这项研究中,我们设计了当地旅游特定图像字幕的模型,后来将支持为各种用户提供帮助的AI驱动系统的开发。该模型是使用视觉注意机制开发的,并使用最先进的特征提取器架构效率网络。收集了当地的旅游数据集并在研究中使用了两种不同类型的标题。字面上描述图像的字幕和表示图像时表示人类逻辑响应的字幕。这样做是为了使字幕模型在援助系统中实施时更加人性化。我们将两个不同模型的性能与其他众所周知的VGG16和InceptionV3进行了比较。我们获得的最佳BLEU得分分别为73.39和24.51,用于训练集和验证集,分别使用EdgitionNetB0。使用开发模型的字幕结果表明,该模型可以为当地旅游相关的图像产生逻辑标题
Smart systems have been massively developed to help humans in various tasks. Deep Learning technologies push even further in creating accurate assistant systems due to the explosion of data lakes. One of the smart system tasks is to disseminate users needed information. This is crucial in the tourism sector to promote local tourism destinations. In this research, we design a model of local tourism specific image captioning, which later will support the development of AI-powered systems that assist various users. The model is developed using a visual Attention mechanism and uses the state-of-the-art feature extractor architecture EfficientNet. A local tourism dataset is collected and is used in the research, along with two different kinds of captions. Captions that describe the image literally and captions that represent human logical responses when seeing the image. This is done to make the captioning model more humane when implemented in the assistance system. We compared the performance of two different models using EfficientNet architectures (B0 and B4) with other well known VGG16 and InceptionV3. The best BLEU scores we get are 73.39 and 24.51 for the training set and the validation set respectively, using EfficientNetB0. The captioning result using the developed model shows that the model can produce logical caption for local tourism-related images