论文标题
在时间上动态视频图中的多任务边缘预测
Multi-Task Edge Prediction in Temporally-Dynamic Video Graphs
论文作者
论文摘要
图形神经网络已证明可以学习有效的节点表示,启用节点,链接和图形级推断。传统的图形网络假设节点之间的静态关系,而视频中实体之间的关系通常会随着时间的流逝而发展,而节点会动态进入和退出。在这种时间动态图中,一个核心问题是推断时空边缘的未来状态,这可能构成多种类型的关系。为了解决这个问题,我们提出了MTD-GNN,这是一个图形网络,用于预测多种类型关系的时间动态边缘。我们提出了一个分解的时空图表层,以学习动态节点表示形式,并提出多任务边缘预测损失,该损失同时模拟多个关系。所提出的体系结构在场景图之上运行,我们通过对象检测和时空链接从视频中获得。对ActionGenome和Clevrer的实验评估表明,在我们的时间动态图网络中对多重关系进行建模可能是互惠互利的,表现优于现有的静态和时空图形神经网络,以及最新的谓词分类方法。
Graph neural networks have shown to learn effective node representations, enabling node-, link-, and graph-level inference. Conventional graph networks assume static relations between nodes, while relations between entities in a video often evolve over time, with nodes entering and exiting dynamically. In such temporally-dynamic graphs, a core problem is inferring the future state of spatio-temporal edges, which can constitute multiple types of relations. To address this problem, we propose MTD-GNN, a graph network for predicting temporally-dynamic edges for multiple types of relations. We propose a factorized spatio-temporal graph attention layer to learn dynamic node representations and present a multi-task edge prediction loss that models multiple relations simultaneously. The proposed architecture operates on top of scene graphs that we obtain from videos through object detection and spatio-temporal linking. Experimental evaluations on ActionGenome and CLEVRER show that modeling multiple relations in our temporally-dynamic graph network can be mutually beneficial, outperforming existing static and spatio-temporal graph neural networks, as well as state-of-the-art predicate classification methods.