论文标题
神经双胞胎谈话
Neural Twins Talk
论文作者
论文摘要
受人脑在增加对主题的关注时如何采用更多神经通路的启发,我们引入了一种新颖的双胞胎级联注意模型,该模型的表现胜过最新的图像字幕模型,该模型最初是在视觉接地任务中使用一种关注通道实现的。视觉接地可确保字幕句中存在单词,这些单词被接地到输入图像中的特定区域。在进行视觉接地任务的深度学习模型之后,该模型在生成字幕时采用了有关视觉接地和对象的顺序的学习模式。我们在可可数据集上的三个图像字幕任务中报告了实验的结果。使用标准图像字幕指标进行了报告,以显示我们模型比上一个图像字幕模型所获得的改进。从我们的实验中收集的结果表明,在深度神经网络中采用更平行的注意力途径会导致更高的性能。我们对NTT的实施可在以下网址公开获取:https://github.com/zanyarz/neuraltwinstalk。
Inspired by how the human brain employs more neural pathways when increasing the focus on a subject, we introduce a novel twin cascaded attention model that outperforms a state-of-the-art image captioning model that was originally implemented using one channel of attention for the visual grounding task. Visual grounding ensures the existence of words in the caption sentence that are grounded into a particular region in the input image. After a deep learning model is trained on visual grounding task, the model employs the learned patterns regarding the visual grounding and the order of objects in the caption sentences, when generating captions. We report the results of our experiments in three image captioning tasks on the COCO dataset. The results are reported using standard image captioning metrics to show the improvements achieved by our model over the previous image captioning model. The results gathered from our experiments suggest that employing more parallel attention pathways in a deep neural network leads to higher performance. Our implementation of NTT is publicly available at: https://github.com/zanyarz/NeuralTwinsTalk.