论文标题
长尾图像识别的逆图像频率
Inverse Image Frequency for Long-tailed Image Recognition
论文作者
论文摘要
长尾分布是现实世界中的常见现象。提取的大规模图像数据集不可避免地证明了长尾巴的属性和经过不平衡数据训练的模型可以为代表性过多的类别获得高性能,但是为代表性不足的类别而苦苦挣扎,导致偏见的预测和绩效退化。为了应对这一挑战,我们提出了一种名为“逆图像频率”(IIF)的新型偏差方法。 IIF是在卷积神经网络的分类层中逻辑的乘法边缘调整转换。我们的方法比类似的作品实现了更强的性能,它对于下游任务(例如长尾实例分割)特别有用,因为它会产生较少的假阳性检测。我们的广泛实验表明,IIF在许多长尾基准测试中超过了最新的现状,例如Imagenet-LT,Cifar-LT,Places-LT和LVIS,在ImaSenet-LT上的RESNET50和26.2%的MaskRcnn在LVIS上的resnet50达到了55.8%的TOP-1准确性。代码可在https://github.com/kostas1515/iif中找到
The long-tailed distribution is a common phenomenon in the real world. Extracted large scale image datasets inevitably demonstrate the long-tailed property and models trained with imbalanced data can obtain high performance for the over-represented categories, but struggle for the under-represented categories, leading to biased predictions and performance degradation. To address this challenge, we propose a novel de-biasing method named Inverse Image Frequency (IIF). IIF is a multiplicative margin adjustment transformation of the logits in the classification layer of a convolutional neural network. Our method achieves stronger performance than similar works and it is especially useful for downstream tasks such as long-tailed instance segmentation as it produces fewer false positive detections. Our extensive experiments show that IIF surpasses the state of the art on many long-tailed benchmarks such as ImageNet-LT, CIFAR-LT, Places-LT and LVIS, reaching 55.8% top-1 accuracy with ResNet50 on ImageNet-LT and 26.2% segmentation AP with MaskRCNN on LVIS. Code available at https://github.com/kostas1515/iif