论文标题
G-Simclr:通过伪标签的指导性投影的自我监督对比学习
G-SimCLR : Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling
论文作者
论文摘要
在计算机视觉领域,很明显,深层神经网络在有大量标记数据的监督环境中表现更好。通过监督学到的表示形式不仅具有高质量,而且还可以帮助该模型提高其准确性。但是,大型数据集的收集和注释是昂贵且耗时的。为了避免这种情况,在无监督的视觉表示学习领域中,尤其是在自我监督的环境中进行了很多研究。在Simclr Chen等人中,在自我监督的视觉识别方法中的最新进展中。表明确实可以在没有明确监督的情况下学习高质量的表示。在SIMCLR中,作者最大化了同一图像的增强的相似性,并最大程度地减少了不同图像的增强的相似性。使用这种方法学到的表示形式训练的线性分类器在Imagenet ILSVRC-2012数据集上产生76.5%的TOP-1精度。在这项工作中,我们提出,使用归一化温度尺度的跨凝结(NT-Xent)损耗函数(如SIMCLR中所用),在同一批次中没有同一类别的图像是有益的。在无监督的设置中,缺少与同一类别有关的图像信息。我们使用在未标记的数据集中训练的DeNoising AutoCoder的潜在空间表示形式,并将其与K均值聚集以获取伪标签。借助此APRIORI信息,我们批处理图像,其中未找到来自同一类别的两个图像。我们报告CIFAR10数据集上的可比性能增强和Imagenet数据集的一个子集。我们将我们的方法称为g-simclr。
In the realms of computer vision, it is evident that deep neural networks perform better in a supervised setting with a large amount of labeled data. The representations learned with supervision are not only of high quality but also helps the model in enhancing its accuracy. However, the collection and annotation of a large dataset are costly and time-consuming. To avoid the same, there has been a lot of research going on in the field of unsupervised visual representation learning especially in a self-supervised setting. Amongst the recent advancements in self-supervised methods for visual recognition, in SimCLR Chen et al. shows that good quality representations can indeed be learned without explicit supervision. In SimCLR, the authors maximize the similarity of augmentations of the same image and minimize the similarity of augmentations of different images. A linear classifier trained with the representations learned using this approach yields 76.5% top-1 accuracy on the ImageNet ILSVRC-2012 dataset. In this work, we propose that, with the normalized temperature-scaled cross-entropy (NT-Xent) loss function (as used in SimCLR), it is beneficial to not have images of the same category in the same batch. In an unsupervised setting, the information of images pertaining to the same category is missing. We use the latent space representation of a denoising autoencoder trained on the unlabeled dataset and cluster them with k-means to obtain pseudo labels. With this apriori information we batch images, where no two images from the same category are to be found. We report comparable performance enhancements on the CIFAR10 dataset and a subset of the ImageNet dataset. We refer to our method as G-SimCLR.