场景图生成的结构学习的约束

论文标题

场景图生成的结构学习的约束

Constrained Structure Learning for Scene Graph Generation

论文作者

Liu, Daqi, Bober, Miroslaw, Kittler, Josef

论文摘要

作为一个结构化的预测任务，场景图生成旨在构建一个视觉接地的场景图，以在输入图像中明确对象及其关系。当前，平均域变异贝叶斯框架是现有方法使用的事实方法，其中通常通过传递神经网络的消息实现了不受限制的推理步骤。但是，这种公式无法探索其他推理策略，并且在很大程度上忽略了更普遍的约束优化模型。在本文中，我们提出了一种约束的结构学习方法，为此提出了明确的约束变异推理目标。使用普遍存在的消息传播策略，而是利用了一种通用约束优化方法 - 熵镜下降 - 来求解约束的变异推理步骤。我们在各种流行场景图生成基准上验证了提出的通用模型，并表明它表现优于最新方法。

As a structured prediction task, scene graph generation aims to build a visually-grounded scene graph to explicitly model objects and their relationships in an input image. Currently, the mean field variational Bayesian framework is the de facto methodology used by the existing methods, in which the unconstrained inference step is often implemented by a message passing neural network. However, such formulation fails to explore other inference strategies, and largely ignores the more general constrained optimization models. In this paper, we present a constrained structure learning method, for which an explicit constrained variational inference objective is proposed. Instead of applying the ubiquitous message-passing strategy, a generic constrained optimization method - entropic mirror descent - is utilized to solve the constrained variational inference step. We validate the proposed generic model on various popular scene graph generation benchmarks and show that it outperforms the state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题