论文标题
跨域比对的最佳最佳传输
Graph Optimal Transport for Cross-Domain Alignment
论文作者
论文摘要
两组实体之间的跨域对齐(例如,图像中的对象,句子中的单词)是计算机视觉和自然语言处理的基础。现有方法主要集中于设计高级注意机制以模拟软对准,而没有训练信号来明确鼓励对齐。学到的注意矩阵也很密集,缺乏可解释性。我们提出了图形最佳传输(GOT),这是一个原则上的框架,它是根据最佳运输(OT)的最新进展而发芽的。在GOT中,通过将实体表示为动态构造的图,将跨域对准作为图形匹配问题。考虑了两种类型的OT距离:(i)节点(实体)匹配的Wasserstein距离(WD); (ii)用于边缘(结构)匹配的Gromov-Wasserstein距离(GWD)。 WD和GWD均可将其纳入现有的神经网络模型中,从而有效地充当了正规器。推断的运输计划还产生了稀疏和自称的对准,从而增强了学习模型的解释性。实验表明,在各种任务中,超过基线的表现一致,包括图像文本检索,视觉问题答案,图像字幕,机器翻译和文本摘要。
Cross-domain alignment between two sets of entities (e.g., objects in an image, words in a sentence) is fundamental to both computer vision and natural language processing. Existing methods mainly focus on designing advanced attention mechanisms to simulate soft alignment, with no training signals to explicitly encourage alignment. The learned attention matrices are also dense and lacks interpretability. We propose Graph Optimal Transport (GOT), a principled framework that germinates from recent advances in Optimal Transport (OT). In GOT, cross-domain alignment is formulated as a graph matching problem, by representing entities into a dynamically-constructed graph. Two types of OT distances are considered: (i) Wasserstein distance (WD) for node (entity) matching; and (ii) Gromov-Wasserstein distance (GWD) for edge (structure) matching. Both WD and GWD can be incorporated into existing neural network models, effectively acting as a drop-in regularizer. The inferred transport plan also yields sparse and self-normalized alignment, enhancing the interpretability of the learned model. Experiments show consistent outperformance of GOT over baselines across a wide range of tasks, including image-text retrieval, visual question answering, image captioning, machine translation, and text summarization.