最好：嘈杂的标签校正和鲁棒场景图的培训

论文标题

最好：嘈杂的标签校正和鲁棒场景图的培训

NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation

论文作者

Li, Lin, Xiao, Jun, Shi, Hanrong, Zhang, Hanwang, Yang, Yi, Liu, Wei, Chen, Long

论文摘要

几乎所有现有的场景图（SGG）模型都忽略了主流SGG数据集的地面真相注释质量，即他们假设：1）所有手动注释的正面样本同样正确； 2）所有未注销的负样本绝对是背景。在本文中，我们认为这两个假设都不适用于SGG：有许多嘈杂的地面谓词标签破坏了这两个假设并损害了无偏见的SGG模型的训练。为此，我们提出了一种新颖的嘈杂标签校正和SGG的样本训练策略：最佳。具体来说，它包括两个部分：尼斯和NIST，它们分别通过产生高质量的样本和有效的培训策略来排除这些嘈杂的标签问题。 NICE首先检测到嘈杂的样品，然后将它们重新分配给它们更多高质量的软谓词标签。 NIST是一种基于多教老师知识蒸馏的培训策略，它使模型能够学习公正的融合知识。 NIST中的动态权衡加权策略旨在惩罚不同教师的偏见。由于NICE和NIST的模型不足的性质，我们最好的最佳可以无缝地纳入任何SGG架构中，以提高其在不同谓词类别上的性能。此外，为了更好地评估SGG模型的概括，我们通过重新组织普遍的VG数据集并故意使培训和测试集的谓词分布尽可能不同。这种新的基准有助于解散基于对象类别类别的频率偏差的影响。大量消融和在不同的骨干和任务上的结果证明了最佳组成部分的有效性和概括能力。

Nearly all existing scene graph generation (SGG) models have overlooked the ground-truth annotation qualities of mainstream SGG datasets, i.e., they assume: 1) all the manually annotated positive samples are equally correct; 2) all the un-annotated negative samples are absolutely background. In this paper, we argue that neither of the assumptions applies to SGG: there are numerous noisy ground-truth predicate labels that break these two assumptions and harm the training of unbiased SGG models. To this end, we propose a novel NoIsy label CorrEction and Sample Training strategy for SGG: NICEST. Specifically, it consists of two parts: NICE and NIST, which rule out these noisy label issues by generating high-quality samples and the effective training strategy, respectively. NICE first detects noisy samples and then reassigns them more high-quality soft predicate labels. NIST is a multi-teacher knowledge distillation based training strategy, which enables the model to learn unbiased fusion knowledge. And a dynamic trade-off weighting strategy in NIST is designed to penalize the bias of different teachers. Due to the model-agnostic nature of both NICE and NIST, our NICEST can be seamlessly incorporated into any SGG architecture to boost its performance on different predicate categories. In addition, to better evaluate the generalization of SGG models, we further propose a new benchmark VG-OOD, by re-organizing the prevalent VG dataset and deliberately making the predicate distributions of the training and test sets as different as possible for each subject-object category pair. This new benchmark helps disentangle the influence of subject-object category based frequency biases. Extensive ablations and results on different backbones and tasks have attested to the effectiveness and generalization ability of each component of NICEST.

下载PDF全文

下载文献需遵守相关版权规定

论文标题