在蒙版图像建模中的数据缩放上

论文标题

在蒙版图像建模中的数据缩放上

On Data Scaling in Masked Image Modeling

论文作者

Xie, Zhenda, Zhang, Zheng, Cao, Yue, Lin, Yutong, Wei, Yixuan, Dai, Qi, Hu, Han

论文摘要

自我监督学习的一个重要目标是使模型预训练能够从几乎无限的数据中受益。但是，一种最近变得流行的方法，即掩盖图像建模（MIM），被怀疑无法从较大的数据中受益。在这项工作中，我们通过广泛的实验打破了这一误解，数据量表从10 \％的Imagenet-1k到完整的Imagenet-22K不等，型号范围从4,900万到10亿，范围从125K迭代到500K迭代。我们的研究表明：（i）蒙版的图像建模也要求较大的数据。我们观察到，非常大的模型的数据过多。（ii）培训的长度很重要。接受掩盖图像建模训练的大型模型可以从更多的数据中受益，并具有更长的培训。（iii）预训练中的验证损失是衡量模型在多个任务上进行微调的表现的好指标。该观察结果使我们能够预先评估预训练的模型，而无需对下游任务进行昂贵的试用和错误评估。我们希望我们的发现将通过缩放能力来提高对蒙版图像建模的理解。

An important goal of self-supervised learning is to enable model pre-training to benefit from almost unlimited data. However, one method that has recently become popular, namely masked image modeling (MIM), is suspected to be unable to benefit from larger data. In this work, we break this misconception through extensive experiments, with data scales ranging from 10\% of ImageNet-1K to full ImageNet-22K, model sizes ranging from 49 million to 1 billion, and training lengths ranging from 125K iterations to 500K iterations. Our study reveals that: (i) Masked image modeling is also demanding on larger data. We observed that very large models got over-fitted with relatively small data; (ii) The length of training matters. Large models trained with masked image modeling can benefit from more data with longer training; (iii) The validation loss in pre-training is a good indicator to measure how well the model performs for fine-tuning on multiple tasks. This observation allows us to pre-evaluate pre-trained models in advance without having to make costly trial-and-error assessments of downstream tasks. We hope that our findings will advance the understanding of masked image modeling in terms of scaling ability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题