将多模式变分的方法概括为集合

论文标题

将多模式变分的方法概括为集合

Generalizing Multimodal Variational Methods to Sets

论文作者

Zhou, Jinzhao, Duan, Yiqun, Chen, Zhihong, Chang, Yu-Cheng, Lin, Chin-Teng

论文摘要

理解多种方式可以产生对现实现象的更全面描述。但是，学习各种方式的共同代表仍然是新兴的机器学习应用程序和研究的长期努力。多模式输入的先前生成方法近似于单模式后期的联合模式后验，作为Experts（POE）或Experts的混合物（MOE）。我们认为，这些近似值导致了一种缺陷的束缚，用于优化过程和模式之间语义连接的丧失。本文在称为集合多模式的VAE（SMVAE）的集合上提出了一种新颖的变异方法，用于学习多模式潜在空间，同时处理缺失的模态问题。通过直接对联合模式后验分布进行建模，拟议的SMVAE学会了在多种模态之间交换信息，并补偿因分解而引起的缺点。在各个领域的公共数据集中，实验结果表明，该方法适用于订单不合时式的跨模式生成，同时与最先进的多模式方法相比实现了出色的性能。我们方法的源代码可在线获得https://anonymon.4open.science/r/smvae-9b3c/。

Making sense of multiple modalities can yield a more comprehensive description of real-world phenomena. However, learning the co-representation of diverse modalities is still a long-standing endeavor in emerging machine learning applications and research. Previous generative approaches for multimodal input approximate a joint-modality posterior by uni-modality posteriors as product-of-experts (PoE) or mixture-of-experts (MoE). We argue that these approximations lead to a defective bound for the optimization process and loss of semantic connection among modalities. This paper presents a novel variational method on sets called the Set Multimodal VAE (SMVAE) for learning a multimodal latent space while handling the missing modality problem. By modeling the joint-modality posterior distribution directly, the proposed SMVAE learns to exchange information between multiple modalities and compensate for the drawbacks caused by factorization. In public datasets of various domains, the experimental results demonstrate that the proposed method is applicable to order-agnostic cross-modal generation while achieving outstanding performance compared to the state-of-the-art multimodal methods. The source code for our method is available online https://anonymous.4open.science/r/SMVAE-9B3C/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题