论文标题
除了深度度量学习之外:增强与对抗歧视域正则化的跨模式匹配
Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with Adversarial Discriminative Domain Regularization
论文作者
论文摘要
对于许多涉及视觉和自然语言处理的应用程序,跨图像和文本方式匹配信息是一个基本挑战。目的是找到有效的相似性指标,以比较视觉和文本信息之间的相似性。现有的方法主要匹配本地视觉对象和共享空间中的句子单词与注意机制。匹配性能仍然受到限制,因为相似性计算基于对匹配功能的简单比较,而忽略了其在数据中分布的特征。在本文中,我们以有效的学习目标来解决此限制,该目标考虑了视觉对象和句子单词之间的判别特征分布。具体而言,我们提出了一个新颖的对抗性歧视域正则化(ADDR)学习框架,超出了范式公制学习目标,以在每个图像文本对中构建一组判别数据域。我们的方法通常可以通过调节匹配对之间隐藏空间的分布来提高学习效率和现有指标学习框架的性能。实验结果表明,这种新方法显着提高了MS-Coco和FlickR30K基准的几种流行跨模式匹配技术(SCAN,VSRN,BFAN)的整体性能。
Matching information across image and text modalities is a fundamental challenge for many applications that involve both vision and natural language processing. The objective is to find efficient similarity metrics to compare the similarity between visual and textual information. Existing approaches mainly match the local visual objects and the sentence words in a shared space with attention mechanisms. The matching performance is still limited because the similarity computation is based on simple comparisons of the matching features, ignoring the characteristics of their distribution in the data. In this paper, we address this limitation with an efficient learning objective that considers the discriminative feature distributions between the visual objects and sentence words. Specifically, we propose a novel Adversarial Discriminative Domain Regularization (ADDR) learning framework, beyond the paradigm metric learning objective, to construct a set of discriminative data domains within each image-text pairs. Our approach can generally improve the learning efficiency and the performance of existing metrics learning frameworks by regulating the distribution of the hidden space between the matching pairs. The experimental results show that this new approach significantly improves the overall performance of several popular cross-modal matching techniques (SCAN, VSRN, BFAN) on the MS-COCO and Flickr30K benchmarks.