GraphCFC：一种基于图形的跨模式特征互补方法，用于多模式对话情感识别

论文标题

GraphCFC：一种基于图形的跨模式特征互补方法，用于多模式对话情感识别

GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition

论文作者

Li, Jiang, Wang, Xiaoping, Lv, Guoqing, Zeng, Zhigang

论文摘要

对话中的情绪识别（ERC）在人类计算机互动（HCI）系统中发挥了重要作用，因为它可以提供善解人意的服务。多模式ERC可以减轻单模式方法的缺点。最近，由于其在关系建模方面的出色性能，图形神经网络（GNN）已被广泛用于各种领域。在多模式ERC中，GNN能够提取长距离上下文信息和模式间交互信息。不幸的是，由于现有的方法（例如MMGCN）直接融合了多种方式，因此可能会生成冗余信息，并且可能会丢失各种信息。在这项工作中，我们提出了一个基于图形的跨模式互补（GraphCFC）模块，该模块可以有效地对上下文和交互式信息进行建模。 GraphCFC通过利用多个子空间提取器和配对跨模式互补（PAIRCC）策略来减轻多模式融合中异质性差距的问题。我们从构造的图表中提取各种类型的边缘以进行编码，从而使GNN在执行消息传递时可以更准确地提取至关重要的上下文和交互信息。此外，我们设计了一种称为GAT-MLP的GNN结构，该结构可以为多模式学习提供新的统一网络框架。两个基准数据集的实验结果表明，我们的GraphCFC优于最新方法（SOTA）方法。

Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fields due to their superior performance in relation modeling. In multimodal ERC, GNNs are capable of extracting both long-distance contextual information and inter-modal interactive information. Unfortunately, since existing methods such as MMGCN directly fuse multiple modalities, redundant information may be generated and diverse information may be lost. In this work, we present a directed Graph based Cross-modal Feature Complementation (GraphCFC) module that can efficiently model contextual and interactive information. GraphCFC alleviates the problem of heterogeneity gap in multimodal fusion by utilizing multiple subspace extractors and Pair-wise Cross-modal Complementary (PairCC) strategy. We extract various types of edges from the constructed graph for encoding, thus enabling GNNs to extract crucial contextual and interactive information more accurately when performing message passing. Furthermore, we design a GNN structure called GAT-MLP, which can provide a new unified network framework for multimodal learning. The experimental results on two benchmark datasets show that our GraphCFC outperforms the state-of-the-art (SOTA) approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题