论文标题
弱监督的跨域对准与最佳运输
Weakly supervised cross-domain alignment with optimal transport
论文作者
论文摘要
图像对象和文本序列之间的跨域对齐是许多视觉任务的关键,它对计算机视觉和自然语言处理构成了根本挑战。本文研究了一种新颖的方法,用于识别和优化图像和文本实体之间的细粒语义相似性,在弱监督的设置下,改善了对最新解决方案的性能。我们的方法基于最佳运输(OT)的最新进展,以原则上的方式解决跨域匹配问题。拟议的OT解决方案可以与其他现有方法结合使用,以置入正规剂的形式配制。我们提供了经验证据,以证明我们的方法的有效性,表明它如何使更简单的模型体系结构优胜或与一系列视觉任务的更复杂的设计相提并论。
Cross-domain alignment between image objects and text sequences is key to many visual-language tasks, and it poses a fundamental challenge to both computer vision and natural language processing. This paper investigates a novel approach for the identification and optimization of fine-grained semantic similarities between image and text entities, under a weakly-supervised setup, improving performance over state-of-the-art solutions. Our method builds upon recent advances in optimal transport (OT) to resolve the cross-domain matching problem in a principled manner. Formulated as a drop-in regularizer, the proposed OT solution can be efficiently computed and used in combination with other existing approaches. We present empirical evidence to demonstrate the effectiveness of our approach, showing how it enables simpler model architectures to outperform or be comparable with more sophisticated designs on a range of vision-language tasks.