无偏见的场景图生成的上下文感知混合物

论文标题

无偏见的场景图生成的上下文感知混合物

Context-aware Mixture-of-Experts for Unbiased Scene Graph Generation

论文作者

Zhou, Liguang, Zhou, Yuhongze, Lam, Tin Lun, Xu, Yangsheng

论文摘要

近年来，场景图（SGG）取得了巨大进展。但是，其基本的谓词类别的长尾分布是一个具有挑战性的问题。对于极为不平衡的谓词分布，现有方法通常构建复杂的上下文编码器，以提取场景上下文与谓词和复杂网络的内在相关性，以提高网络模型的学习能力，以实现高度不平衡的谓词分布。为了解决无偏的SGG问题，我们引入了一种简单而有效的方法，称为上下文感知的Experts（来），以提高模型多样性并减轻偏见的SGG而没有复杂的设计。具体而言，我们建议将专家的混合物与鸿沟和整体策略整合在一起，以纠正谓词类的严重长尾分布，这适用于大多数无偏见的场景图生成器。因此减少了偏见的SGG，模型倾向于预测分布更均匀的谓词预测。为了区分各种谓词分布水平，具有相同权重的专家不足以多样化。为了使网络动态利用丰富的场景上下文并进一步提高模型的多样性，我们只需使用内置模块来创建上下文编码。每个专家对场景上下文以及每个专家的谓词的重要性都与专家加权（EW）和谓词加权（PW）策略动态关联。我们使用视觉基因组数据集对三个任务进行了广泛的实验，这表明这表现优于最新方法，并且实现了最新的性能。我们的代码将公开使用。

Scene graph generation (SGG) has gained tremendous progress in recent years. However, its underlying long-tailed distribution of predicate classes is a challenging problem. For extremely unbalanced predicate distributions, existing approaches usually construct complicated context encoders to extract the intrinsic relevance of scene context to predicates and complex networks to improve the learning ability of network models for highly imbalanced predicate distributions. To address the unbiased SGG problem, we introduce a simple yet effective method dubbed Context-Aware Mixture-of-Experts (CAME) to improve model diversity and mitigate biased SGG without complicated design. Specifically, we propose to integrate the mixture of experts with a divide and ensemble strategy to remedy the severely long-tailed distribution of predicate classes, which is applicable to the majority of unbiased scene graph generators. The biased SGG is thereby reduced, and the model tends to anticipate more evenly distributed predicate predictions. To differentiate between various predicate distribution levels, experts with the same weights are not sufficiently diverse. In order to enable the network dynamically exploit the rich scene context and further boost the diversity of model, we simply use the built-in module to create a context encoder. The importance of each expert to scene context and each predicate to each expert is dynamically associated with expert weighting (EW) and predicate weighting (PW) strategy. We have conducted extensive experiments on three tasks using the Visual Genome dataset, showing that CAME outperforms recent methods and achieves state-of-the-art performance. Our code will be available publicly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题