宝石：使用图的生成模型的场景扩展

论文标题

宝石：使用图的生成模型的场景扩展

GEMS: Scene Expansion using Generative Models of Graphs

论文作者

Agarwal, Rishi, Chandra, Tirupati Saketh, Patil, Vaidehi, Mahapatra, Aniruddha, Kulkarni, Kuldeep, Vinay, Vishwa

论文摘要

基于图像检索的应用需要在中间空间中进行编辑和关联，这些中间空间代表了对象及其关系，而不是密集的像素级表示，例如RGB图像或语义标签图。我们专注于这样的表示形式，场景图，并提出了一个新颖的场景扩展任务，我们通过添加新节点（对象）和相应的关系来丰富输入种子图。为此，我们将场景图扩展作为一个顺序预测任务，涉及首先预测新节点的多个步骤，然后预测图中新预测的节点与以前的节点之间的关系集。我们为观察到的图表提出了一个测序策略，该图形保留了节点之间的聚类模式。此外，我们利用外部知识来训练我们的图生成模型，从而对节点预测进行更大的概括。由于现有的最大平均差异（MMD）指标的效率低下，用于评估节点之间的预测关系（对象），因此我们设计了新颖的指标，这些指标可以全面评估预测关系的不同方面。我们对视觉基因组和VRD数据集进行了广泛的实验，以使用标准的基于MMD的指标和我们建议的指标来评估扩展的场景图。我们观察到，与GraphRNN这样的基线方法，通过我们的方法，GEM，GEMS生成的图形更好地表示场景图的真实分布。

Applications based on image retrieval require editing and associating in intermediate spaces that are representative of the high-level concepts like objects and their relationships rather than dense, pixel-level representations like RGB images or semantic-label maps. We focus on one such representation, scene graphs, and propose a novel scene expansion task where we enrich an input seed graph by adding new nodes (objects) and the corresponding relationships. To this end, we formulate scene graph expansion as a sequential prediction task involving multiple steps of first predicting a new node and then predicting the set of relationships between the newly predicted node and previous nodes in the graph. We propose a sequencing strategy for observed graphs that retains the clustering patterns amongst nodes. In addition, we leverage external knowledge to train our graph generation model, enabling greater generalization of node predictions. Due to the inefficiency of existing maximum mean discrepancy (MMD) based metrics for graph generation problems in evaluating predicted relationships between nodes (objects), we design novel metrics that comprehensively evaluate different aspects of predicted relations. We conduct extensive experiments on Visual Genome and VRD datasets to evaluate the expanded scene graphs using the standard MMD-based metrics and our proposed metrics. We observe that the graphs generated by our method, GEMS, better represent the real distribution of the scene graphs than the baseline methods like GraphRNN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题