重新思考预训练和自我训练

论文标题

重新思考预训练和自我训练

Rethinking Pre-training and Self-training

论文作者

Zoph, Barret, Ghiasi, Golnaz, Lin, Tsung-Yi, Cui, Yin, Liu, Hanxiao, Cubuk, Ekin D., Le, Quoc V.

论文摘要

预训练是计算机视觉中的主要范式。例如，监督的Imagenet预训练通常用于初始化对象检测和分割模型的骨架。然而，他等人表明了一个令人惊讶的结果，即对可可对象检测的影响有限。在这里，我们将自我训练作为另一种利用同一设置的其他数据的方法，并将其与Imagenet预训练进行对比。 Our study reveals the generality and flexibility of self-training with three additional insights: 1) stronger data augmentation and more labeled data further diminish the value of pre-training, 2) unlike pre-training, self-training is always helpful when using stronger data augmentation, in both low-data and high-data regimes, and 3) in the case that pre-training is helpful, self-training improves upon pre-training.例如，在可可对象检测数据集上，当我们使用标记数据的五分之一时，预训练的好处，并且当我们使用所有标记的数据时会造成损害的准确性。另一方面，自我训练在所有数据集大小上显示出从+1.3到 +3.4AP的积极改进。换句话说，自我训练在与预训练不起作用的相同设置上（使用Imagenet来帮助可可）都可以很好地工作。在Pascal分割数据集上，该数据集比可可小得多，尽管预训练确实有助于显着，但自我培训在预训练的模型上有所改善。在可可对象检测时，我们达到了54.3AP，比最强的旋烯型模型改善了 +1.5AP。在Pascal分段时，我们达到了90.5 MIOU，比DeepLabV3 +的先前最新结果提高了1.5％MIOU。

Pre-training is a dominant paradigm in computer vision. For example, supervised ImageNet pre-training is commonly used to initialize the backbones of object detection and segmentation models. He et al., however, show a surprising result that ImageNet pre-training has limited impact on COCO object detection. Here we investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training. Our study reveals the generality and flexibility of self-training with three additional insights: 1) stronger data augmentation and more labeled data further diminish the value of pre-training, 2) unlike pre-training, self-training is always helpful when using stronger data augmentation, in both low-data and high-data regimes, and 3) in the case that pre-training is helpful, self-training improves upon pre-training. For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data. Self-training, on the other hand, shows positive improvements from +1.3 to +3.4AP across all dataset sizes. In other words, self-training works well exactly on the same setup that pre-training does not work (using ImageNet to help COCO). On the PASCAL segmentation dataset, which is a much smaller dataset than COCO, though pre-training does help significantly, self-training improves upon the pre-trained model. On COCO object detection, we achieve 54.3AP, an improvement of +1.5AP over the strongest SpineNet model. On PASCAL segmentation, we achieve 90.5 mIOU, an improvement of +1.5% mIOU over the previous state-of-the-art result by DeepLabv3+.

下载PDF全文

下载文献需遵守相关版权规定

论文标题