论文标题
几何多模式对比表示学习
Geometric Multimodal Contrastive Representation Learning
论文作者
论文摘要
多模式数据的学习表示,在测试时间丢失模式的多模式数据,由于从不同渠道获得的数据的固有异质性,这仍然是一个具有挑战性的问题。为了解决这个问题,我们提出了一种新型的几何多模式对比(GMC)表示方法,该学习方法由两个主要组成部分组成:i)由特定于模态的基础编码器组成的两级体系结构,可以处理任意数量的模态数字,以将固定尺寸的中间表示形式绘制为固定的代表,并映射了一个插图的代表,以绘制一个插入的代表。 ii)一种多模式对比损失函数,鼓励学习的表示形式的几何对齐。我们从实验上证明,GMC表示在语义上是丰富的,并实现了最先进的表现,而缺少有关三种不同学习问题的模式信息,包括预测和强化学习任务。
Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method consisting of two main components: i) a two-level architecture consisting of modality-specific base encoders, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.