论文标题

线索:标题生成的跨模式相干建模

Clue: Cross-modal Coherence Modeling for Caption Generation

论文作者

Alikhani, Malihe, Sharma, Piyush, Li, Shengjie, Soricut, Radu, Stone, Matthew

论文摘要

我们使用受话语计算模型启发的连贯关系来研究图像字幕的信息需求和目标。使用专门设计用于捕获图像的注释协议 - 符号相干关系,我们从公共可用的图像对 - 支柱对进行注释10,000个实例。我们介绍了一项新的任务,以学习图像和文本,连贯关系预测的推断,并表明可以利用这些连贯的注释来学习关系分类器作为中介步骤,还可以训练连贯感知的,可控制的图像字幕模型。结果表明,关于通过连贯关系指定的信息需求,生成字幕的一致性和质量有了显着的改善。

We use coherence relations inspired by computational models of discourse to study the information needs and goals of image captioning. Using an annotation protocol specifically devised for capturing image--caption coherence relations, we annotate 10,000 instances from publicly-available image--caption pairs. We introduce a new task for learning inferences in imagery and text, coherence relation prediction, and show that these coherence annotations can be exploited to learn relation classifiers as an intermediary step, and also train coherence-aware, controllable image captioning models. The results show a dramatic improvement in the consistency and quality of the generated captions with respect to information needs specified via coherence relations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源