学习从头到尾生成场景图

论文标题

学习从头到尾生成场景图

Learning To Generate Scene Graph from Head to Tail

论文作者

Zheng, Chaofan, Lyu, Xinyu, Guo, Yuyu, Zeng, Pengpeng, Song, Jingkuan, Gao, Lianli

论文摘要

场景图生成（SGG）代表对象及其与图形结构的相互作用。最近，许多作品致力于解决SGG中的不平衡问题。但是，在整个训练过程中低估了头部谓词，他们破坏了为尾部提供一般特征的头部谓词的特征。此外，对尾部谓词的过多关注会导致语义偏差。基于此，我们提出了一个新颖的SGG框架，学习以从头到尾生成场景图（SGG-HT），其中包含课程重新定位机制（CRM）和语义上下文上下文模块（SCM）。 CRM首先学习头部/简易样品，以获得头部谓词的鲁棒特征，然后逐渐专注于尾部/硬质。建议通过确保在全球和本地表示中生成的场景图与地面真相之间的语义一致性来缓解语义偏差。实验表明，SGG-HT显着减轻了视觉基因组上最先进的表演的偏见问题。

Scene Graph Generation (SGG) represents objects and their interactions with a graph structure. Recently, many works are devoted to solving the imbalanced problem in SGG. However, underestimating the head predicates in the whole training process, they wreck the features of head predicates that provide general features for tail ones. Besides, assigning excessive attention to the tail predicates leads to semantic deviation. Based on this, we propose a novel SGG framework, learning to generate scene graphs from Head to Tail (SGG-HT), containing Curriculum Re-weight Mechanism (CRM) and Semantic Context Module (SCM). CRM learns head/easy samples firstly for robust features of head predicates and then gradually focuses on tail/hard ones. SCM is proposed to relieve semantic deviation by ensuring the semantic consistency between the generated scene graph and the ground truth in global and local representations. Experiments show that SGG-HT significantly alleviates the biased problem and chieves state-of-the-art performances on Visual Genome.

下载PDF全文

下载文献需遵守相关版权规定

论文标题