在时间上动态视频图中的多任务边缘预测

论文标题

在时间上动态视频图中的多任务边缘预测

Multi-Task Edge Prediction in Temporally-Dynamic Video Graphs

论文作者

Ülger, Osman, Wiederer, Julian, Ghafoorian, Mohsen, Belagiannis, Vasileios, Mettes, Pascal

论文摘要

图形神经网络已证明可以学习有效的节点表示，启用节点，链接和图形级推断。传统的图形网络假设节点之间的静态关系，而视频中实体之间的关系通常会随着时间的流逝而发展，而节点会动态进入和退出。在这种时间动态图中，一个核心问题是推断时空边缘的未来状态，这可能构成多种类型的关系。为了解决这个问题，我们提出了MTD-GNN，这是一个图形网络，用于预测多种类型关系的时间动态边缘。我们提出了一个分解的时空图表层，以学习动态节点表示形式，并提出多任务边缘预测损失，该损失同时模拟多个关系。所提出的体系结构在场景图之上运行，我们通过对象检测和时空链接从视频中获得。对ActionGenome和Clevrer的实验评估表明，在我们的时间动态图网络中对多重关系进行建模可能是互惠互利的，表现优于现有的静态和时空图形神经网络，以及最新的谓词分类方法。

Graph neural networks have shown to learn effective node representations, enabling node-, link-, and graph-level inference. Conventional graph networks assume static relations between nodes, while relations between entities in a video often evolve over time, with nodes entering and exiting dynamically. In such temporally-dynamic graphs, a core problem is inferring the future state of spatio-temporal edges, which can constitute multiple types of relations. To address this problem, we propose MTD-GNN, a graph network for predicting temporally-dynamic edges for multiple types of relations. We propose a factorized spatio-temporal graph attention layer to learn dynamic node representations and present a multi-task edge prediction loss that models multiple relations simultaneously. The proposed architecture operates on top of scene graphs that we obtain from videos through object detection and spatio-temporal linking. Experimental evaluations on ActionGenome and CLEVRER show that modeling multiple relations in our temporally-dynamic graph network can be mutually beneficial, outperforming existing static and spatio-temporal graph neural networks, as well as state-of-the-art predicate classification methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题