论文标题
训练单标签注释的多标签分类器的空间一致性损失
Spatial Consistency Loss for Training Multi-Label Classifiers from Single-Label Annotations
论文作者
论文摘要
由于自然图像通常包含多个对象,因此多标签图像分类比单标签分类更适合“野外”。但是,用每个感兴趣的对象对图像进行详尽的注释是昂贵且耗时的。我们的目标是仅从单标签注释中培训多标签分类器。我们表明,增加一致性损失,确保网络对连续培训时期的预测是一致的,它是一种简单而有效的方法,可以在弱监督的环境中培训多标签分类器。我们通过确保在连续训练时期生成的空间特征图的一致性,从而在空间上进一步扩展了这种方法,从而为每个训练图像维持每类运行平均热图。我们表明,这种空间一致性损失进一步改善了分类器的多标签图。此外,我们表明该方法通过恢复正确的监督信号来克服“农作物”数据启发的缺点,即使大多数单个地面真相对象都通过数据扩展从输入图像中裁剪出来。我们证明了在二进制跨透明基线上的一致性和空间一致性损失,以及在MS-Coco和Pascal VOC上的竞争方法。我们还使用真实的多标签验证集在Imagenet-1K上展示了改进的多标签分类图。
As natural images usually contain multiple objects, multi-label image classification is more applicable "in the wild" than single-label classification. However, exhaustively annotating images with every object of interest is costly and time-consuming. We aim to train multi-label classifiers from single-label annotations only. We show that adding a consistency loss, ensuring that the predictions of the network are consistent over consecutive training epochs, is a simple yet effective method to train multi-label classifiers in a weakly supervised setting. We further extend this approach spatially, by ensuring consistency of the spatial feature maps produced over consecutive training epochs, maintaining per-class running-average heatmaps for each training image. We show that this spatial consistency loss further improves the multi-label mAP of the classifiers. In addition, we show that this method overcomes shortcomings of the "crop" data-augmentation by recovering correct supervision signal even when most of the single ground truth object is cropped out of the input image by the data augmentation. We demonstrate gains of the consistency and spatial consistency losses over the binary cross-entropy baseline, and over competing methods, on MS-COCO and Pascal VOC. We also demonstrate improved multi-label classification mAP on ImageNet-1K using the ReaL multi-label validation set.