论文标题
维雷尔:与图形类比的无监督视觉关系发现
ViRel: Unsupervised Visual Relations Discovery with Graph-level Analogy
论文作者
论文摘要
视觉关系构成了理解我们的构图世界的基础,因为视觉对象之间的关系捕获了场景中的关键信息。然后,从数据自动学习关系是有利的,因为使用预定义的标签学习无法捕获所有可能的关系。但是,当前的关系学习方法通常需要监督,并且并不是旨在将比较复杂的关系结构的场景概括为比训练过程中更复杂的场景。在这里,我们介绍了Virel,这是一种使用图形级别类比的无监督发现和学习视觉关系的方法。在任务中的场景共享相同基本关系子图结构的环境中,我们对比的同构和非同构图的学习方法以无聊的方式发现了跨任务的关系。一旦学习了关系,Virel就可以通过解析预测的关系结构来检索每个任务的共享关系图结构。使用基于网格世界和抽象推理语料库的数据集,我们表明我们的方法在关系分类中达到了95%的精度,发现了大多数任务的关系图结构,并进一步概括了具有更复杂关系结构的看不见的任务。
Visual relations form the basis of understanding our compositional world, as relationships between visual objects capture key information in a scene. It is then advantageous to learn relations automatically from the data, as learning with predefined labels cannot capture all possible relations. However, current relation learning methods typically require supervision, and are not designed to generalize to scenes with more complicated relational structures than those seen during training. Here, we introduce ViRel, a method for unsupervised discovery and learning of Visual Relations with graph-level analogy. In a setting where scenes within a task share the same underlying relational subgraph structure, our learning method of contrasting isomorphic and non-isomorphic graphs discovers the relations across tasks in an unsupervised manner. Once the relations are learned, ViRel can then retrieve the shared relational graph structure for each task by parsing the predicted relational structure. Using a dataset based on grid-world and the Abstract Reasoning Corpus, we show that our method achieves above 95% accuracy in relation classification, discovers the relation graph structure for most tasks, and further generalizes to unseen tasks with more complicated relational structures.