论文标题
Svgraph:从教学视频中学习语义图
SVGraph: Learning Semantic Graphs from Instructional Videos
论文作者
论文摘要
在这项工作中,我们专注于生成嘈杂的,教学视频的图形表示,以供视频理解。我们提出了一种自我监督,可解释的方法,该方法不需要任何图形表示的注释,这将是昂贵且耗时的。我们试图通过呈现语义视频图或SVGraph来克服“黑匣子”学习限制,这是一种多模式的方法,它利用叙述来实现学习图的语义解释性。 SVGraph 1)依靠多种方式之间的一致性来学习统一的图形结构,并借助跨模式的注意力和2)在语义分配的帮助下分配语义解释,该语义分配从视频叙述中捕获了语义。我们在多个数据集上执行实验,并演示语义图学习中SVGraph的解释性。
In this work, we focus on generating graphical representations of noisy, instructional videos for video understanding. We propose a self-supervised, interpretable approach that does not require any annotations for graphical representations, which would be expensive and time consuming to collect. We attempt to overcome "black box" learning limitations by presenting Semantic Video Graph or SVGraph, a multi-modal approach that utilizes narrations for semantic interpretability of the learned graphs. SVGraph 1) relies on the agreement between multiple modalities to learn a unified graphical structure with the help of cross-modal attention and 2) assigns semantic interpretation with the help of Semantic-Assignment, which captures the semantics from video narration. We perform experiments on multiple datasets and demonstrate the interpretability of SVGraph in semantic graph learning.