论文标题

纵横交错的标题:扩展的MS-Coco的延长的通道内和模式的语义相似性判断

Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO

论文作者

Parekh, Zarana, Baldridge, Jason, Cer, Daniel, Waters, Austin, Yang, Yinfei

论文摘要

通过支持多模式检索培训和评估,图像字幕数据集在表示学习方面取得了显着进步。不幸的是,数据集具有有限的跨模式关联:图像与其他图像没有配对,字幕仅与同一图像的其他字幕配对,没有负相关,并且缺少积极的交叉模式关联。这破坏了关于模式间学习如何影响模式内部任务的研究。我们使用纵横交错的标题(CXC)解决了这一差距,这是MS-COCO数据集的扩展,具有人类语义相似性判断,以267,095个内部和模式间对。我们报告了强大的现有单峰和多模型模型的CXC的基线结果。我们还评估了在图像捕获和字幕扣对上训练的多任务双重编码器,该编码对CXC的价值至关重要,以测量CXC的价值,以测量内模性学习和模式间学习的影响。

By supporting multi-modal retrieval training and evaluation, image captioning datasets have spurred remarkable progress on representation learning. Unfortunately, datasets have limited cross-modal associations: images are not paired with other images, captions are only paired with other captions of the same image, there are no negative associations and there are missing positive cross-modal associations. This undermines research into how inter-modality learning impacts intra-modality tasks. We address this gap with Crisscrossed Captions (CxC), an extension of the MS-COCO dataset with human semantic similarity judgments for 267,095 intra- and inter-modality pairs. We report baseline results on CxC for strong existing unimodal and multimodal models. We also evaluate a multitask dual encoder trained on both image-caption and caption-caption pairs that crucially demonstrates CxC's value for measuring the influence of intra- and inter-modality learning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源