论文标题

视觉对话框的迭代上下文感知图推理

Iterative Context-Aware Graph Inference for Visual Dialog

论文作者

Guo, Dan, Wang, Hui, Zhang, Hanwang, Zha, Zheng-Jun, Wang, Meng

论文摘要

视觉对话框是一项具有挑战性的任务,需要在隐式视觉和文本上下文中理解语义依赖性。此任务可以指具有稀疏上下文和未知图结构(关系描述符)的图形模型中的关系推断,以及如何建模基本的上下文感知关系推断至关重要。为此,我们提出了一个新颖的上下文感知图(CAG)神经网络。图中的每个节点都对应于联合语义功能,包括基于对象的(视觉)和与历史相关的(文本)上下文表示。图形结构(对话框中的关系)使用自适应上的$ k $消息传递机制进行迭代更新。具体而言,在每个消息传递步骤中,每个节点都会选择最大的$ k $相关节点,并且仅从中接收消息。然后,更新后,我们将图形注意力引起所有节点以获取最终图嵌入并推断答案。在CAG中,每个节点在图中具有动态关系(不同的相关$ k $邻居节点),并且只有最相关的节点归因于上下文感知的关系图推断。 Visdial V0.9和V1.0数据集的实验结果表明,CAG的表现优于比较方法。可视化结果进一步验证了我们方法的解释性。

Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts. This task can refer to the relation inference in a graphical model with sparse contexts and unknown graph structure (relation descriptor), and how to model the underlying context-aware relation inference is critical. To this end, we propose a novel Context-Aware Graph (CAG) neural network. Each node in the graph corresponds to a joint semantic feature, including both object-based (visual) and history-related (textual) context representations. The graph structure (relations in dialog) is iteratively updated using an adaptive top-$K$ message passing mechanism. Specifically, in every message passing step, each node selects the most $K$ relevant nodes, and only receives messages from them. Then, after the update, we impose graph attention on all the nodes to get the final graph embedding and infer the answer. In CAG, each node has dynamic relations in the graph (different related $K$ neighbor nodes), and only the most relevant nodes are attributive to the context-aware relational graph inference. Experimental results on VisDial v0.9 and v1.0 datasets show that CAG outperforms comparative methods. Visualization results further validate the interpretability of our method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源