保存稳健跨模式检索的语义街区

论文标题

保存稳健跨模式检索的语义街区

Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval

论文作者

Thomas, Christopher, Kovashka, Adriana

论文摘要

丰富的多模式数据（例如社交媒体帖子）激发了人们对跨模式检索方法的兴趣。流行的方法依赖于各种度量学习损失，这些损失规定图像和文本在学习的空间中的近距离。但是，大多数先前的方法都集中在图像和文本传达冗余信息的情况下。相反，现实世界的图像文本对传达了互补信息，几乎没有重叠。此外，新闻文章和媒体中的图像以视觉上的方式描绘了主题；因此，我们需要特别注意确保有意义的图像表示。我们提出了新型的模式内部损失，这些损失鼓励文本和图像子空间中的语义连贯性，这不一定与视觉相干性保持一致。我们的方法不仅可以确保配对的图像和文本关闭，还可以观察到预期的图像图像和文本文本关系。与五个基线相比，我们的方法改善了四个数据集的跨模式检索结果。

The abundance of multimodal data (e.g. social media posts) has inspired interest in cross-modal retrieval methods. Popular approaches rely on a variety of metric learning losses, which prescribe what the proximity of image and text should be, in the learned space. However, most prior methods have focused on the case where image and text convey redundant information; in contrast, real-world image-text pairs convey complementary information with little overlap. Further, images in news articles and media portray topics in a visually diverse fashion; thus, we need to take special care to ensure a meaningful image representation. We propose novel within-modality losses which encourage semantic coherency in both the text and image subspaces, which does not necessarily align with visual coherency. Our method ensures that not only are paired images and texts close, but the expected image-image and text-text relationships are also observed. Our approach improves the results of cross-modal retrieval on four datasets compared to five baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题