论文标题
通过通道交换深度多模式融合
Deep Multimodal Fusion by Channel Exchanging
论文作者
论文摘要
通过使用多个数据源进行分类或回归来进行深层多模式融合,比单峰对应物在各种应用方面具有明显的优势。然而,包括基于聚合和基于对齐的融合在内的当前方法仍然不足以平衡模式间融合和模式内处理之间的权衡,从而产生了瓶颈的瓶颈。为此,本文提出了通道交换网络(CEN),这是一种无参数的多模式融合框架,该框架在不同模态的子网之间动态交换通道。具体而言,频道交换过程是由个体通道重要性自引导的,该过程是通过训练过程中批处理差异(BN)缩放系数衡量的。这种交换过程的有效性也可以通过共享卷积过滤器,但在跨模态下保持单独的BN层来保证,这是一个附加的好处,它使我们的多模式体系结构几乎与单峰网络一样紧凑。与当前的最新方法相比,通过RGB-D数据和图像翻译进行语义分割的广泛实验可验证我们CEN的有效性。还进行了详细的消融研究,这可以证明我们提出的每个组件的优势。我们的代码可在https://github.com/yikaiw/cen上找到。
Deep multimodal fusion by using multiple sources of data for classification or regression has exhibited a clear advantage over the unimodal counterpart on various applications. Yet, current methods including aggregation-based and alignment-based fusion are still inadequate in balancing the trade-off between inter-modal fusion and intra-modal processing, incurring a bottleneck of performance improvement. To this end, this paper proposes Channel-Exchanging-Network (CEN), a parameter-free multimodal fusion framework that dynamically exchanges channels between sub-networks of different modalities. Specifically, the channel exchanging process is self-guided by individual channel importance that is measured by the magnitude of Batch-Normalization (BN) scaling factor during training. The validity of such exchanging process is also guaranteed by sharing convolutional filters yet keeping separate BN layers across modalities, which, as an add-on benefit, allows our multimodal architecture to be almost as compact as a unimodal network. Extensive experiments on semantic segmentation via RGB-D data and image translation through multi-domain input verify the effectiveness of our CEN compared to current state-of-the-art methods. Detailed ablation studies have also been carried out, which provably affirm the advantage of each component we propose. Our code is available at https://github.com/yikaiw/CEN.