转向自我监督的功能学习超出本地像素统计数据

论文标题

转向自我监督的功能学习超出本地像素统计数据

Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics

论文作者

Jenni, Simon, Jin, Hailin, Favaro, Paolo

论文摘要

我们基于对图像的特定转换的歧视，介绍了一种新颖的原则，用于自我监督特征学习。我们认为，学习特征的概括能力取决于邻域大小足以区分不同的图像变换：所需的邻域大小越大，并且该功能可以描述的图像统计信息越大。对全局图像统计信息的准确描述允许更好地表示对象及其上下文的形状和配置，最终将对象分类和检测等新任务更好地概括。这表明了选择和设计图像转换的标准。基于此标准，我们介绍了一种新型的图像转换，我们称之为有限的上下文介绍（LCI）。这种转换为仅在小矩形像素边界（有限上下文）的条件下的图像贴片。由于边界信息有限，因此inpainter可以学会匹配本地像素统计信息，但不太可能匹配图像的全局统计信息。我们声称可以使用相同的原则来证明转换的性能，例如图像旋转和翘曲。确实，我们通过实验证明，学习歧视LCI，图像翘曲和旋转等转换，在多个数据集上具有最先进的概括功能，例如Pascal VOC，STL-10，Celeba和Imagenet。值得注意的是，我们训练有素的功能在与Imagenet标签受到监督的学习训练的功能相同的位置上表现出色。

We introduce a novel principle for self-supervised feature learning based on the discrimination of specific transformations of an image. We argue that the generalization capability of learned features depends on what image neighborhood size is sufficient to discriminate different image transformations: The larger the required neighborhood size and the more global the image statistics that the feature can describe. An accurate description of global image statistics allows to better represent the shape and configuration of objects and their context, which ultimately generalizes better to new tasks such as object classification and detection. This suggests a criterion to choose and design image transformations. Based on this criterion, we introduce a novel image transformation that we call limited context inpainting (LCI). This transformation inpaints an image patch conditioned only on a small rectangular pixel boundary (the limited context). Because of the limited boundary information, the inpainter can learn to match local pixel statistics, but is unlikely to match the global statistics of the image. We claim that the same principle can be used to justify the performance of transformations such as image rotations and warping. Indeed, we demonstrate experimentally that learning to discriminate transformations such as LCI, image warping and rotations, yields features with state of the art generalization capabilities on several datasets such as Pascal VOC, STL-10, CelebA, and ImageNet. Remarkably, our trained features achieve a performance on Places on par with features trained through supervised learning with ImageNet labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题