视频字幕的指导模块网络

论文标题

视频字幕的指导模块网络

Guidance Module Network for Video Captioning

论文作者

Zhang, Xiao, Liu, Chunsheng, Chang, Faliang

论文摘要

视频字幕是一项具有挑战性且重要的任务，它描述了单个句子中视频剪辑的内容。视频字幕的模型通常是编码器编码器。我们发现，提取的视频功能的归一化可以改善视频字幕的最终性能。通常使用教师强制的策略对编码器模型进行训练，以使每个单词接近0-1分布的预测概率并忽略其他单词。在本文中，我们提出了一种新颖的体系结构，该架构介绍了一个指导模块，以鼓励编码器模型在标题中生成与过去和将来的单词相关的单词。基于标准化和指导模块，建立了指导模块网（GMNET）。对常用数据集MSVD的实验结果表明，提出的GMNET可以在视频字幕任务上提高编码器模型的性能。

Video captioning has been a challenging and significant task that describes the content of a video clip in a single sentence. The model of video captioning is usually an encoder-decoder. We find that the normalization of extracted video features can improve the final performance of video captioning. Encoder-decoder model is usually trained using teacher-enforced strategies to make the prediction probability of each word close to a 0-1 distribution and ignore other words. In this paper, we present a novel architecture which introduces a guidance module to encourage the encoder-decoder model to generate words related to the past and future words in a caption. Based on the normalization and guidance module, guidance module net (GMNet) is built. Experimental results on commonly used dataset MSVD show that proposed GMNet can improve the performance of the encoder-decoder model on video captioning tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题