一种用于遥感图像字幕的新型演员双批评模型

论文标题

一种用于遥感图像字幕的新型演员双批评模型

A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning

论文作者

Chavhan, Ruchika, Banerjee, Biplab, Zhu, Xiao Xiang, Chaudhuri, Subhasis

论文摘要

我们处理了使用深度强化学习概念从光学遥感（RS）图像中生成文本字幕的问题。由于描述遥感数据的参考句子中具有很高的类间相似性，因此共同编码句子和图像，可以预测在许多情况下比地面真相更精确的字幕。为此，我们介绍了一个参与者的双批评训练策略，其中第二个评论家模型以编码器decoder rnn的形式部署，以编码与原始和生成的字幕相对应的潜在信息。尽管所有演员批评方法都使用演员来预测图像的句子，并提供批评者提供奖励，但我们提出的编码器decoder rnn可以通过句子到图像翻译来保证对图像的高级理解。我们观察到，所提出的模型会在测试数据上生成句子，这与地面真相高度相似，并且在许多关键情况下成功地产生了更好的标题。与先前的最新方法相比，在基准遥感图像字幕数据集（RSICD）和UCM-CAPTIONS数据集上进行了广泛的实验证实了所提出的方法的优势，在这里我们获得了Rouge-l和cider Mesuper的急剧增长。

We deal with the problem of generating textual captions from optical remote sensing (RS) images using the notion of deep reinforcement learning. Due to the high inter-class similarity in reference sentences describing remote sensing data, jointly encoding the sentences and images encourages prediction of captions that are semantically more precise than the ground truth in many cases. To this end, we introduce an Actor Dual-Critic training strategy where a second critic model is deployed in the form of an encoder-decoder RNN to encode the latent information corresponding to the original and generated captions. While all actor-critic methods use an actor to predict sentences for an image and a critic to provide rewards, our proposed encoder-decoder RNN guarantees high-level comprehension of images by sentence-to-image translation. We observe that the proposed model generates sentences on the test data highly similar to the ground truth and is successful in generating even better captions in many critical cases. Extensive experiments on the benchmark Remote Sensing Image Captioning Dataset (RSICD) and the UCM-captions dataset confirm the superiority of the proposed approach in comparison to the previous state-of-the-art where we obtain a gain of sharp increments in both the ROUGE-L and CIDEr measures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题