了解非对抗性暹罗代表性学习的崩溃

论文标题

了解非对抗性暹罗代表性学习的崩溃

Understanding Collapse in Non-Contrastive Siamese Representation Learning

论文作者

Li, Alexander C., Efros, Alexei A., Pathak, Deepak

论文摘要

对比方法导致了最近的自我监督表示学习（SSL）的表现激增。诸如BYOL或SIMSIAM之类的最新方法据称将这些对比方法提炼到了它们的本质，消除了不影响下游性能的铃铛和哨子，包括负面示例。这些“非对比度”方法在不使用负面因素的情况下出人意料地奏效，即使全球最低限度在于微不足道的崩溃。我们通过经验分析了这些非对抗性方法，发现Simsiam对数据集和模型大小非常敏感。特别是，如果模型相对于数据集大小而言太小，则SIMSIAM表示会经历部分维度崩溃。我们提出了一个度量标准来测量这种崩溃的程度，并表明它可以用于预测下游任务性能，而无需任何微调或标签。我们进一步分析建筑设计选择及其对下游性能的影响。最后，我们证明，转移到持续的学习设置充当正规化器并防止崩溃，并且使用Imagenet上的Resnet-18使用RESNET-18，持续和多上述训练之间的混合物可以提高线性探针精度多达18个百分点。我们的项目页面位于https://alexanderli.com/noncontrastive-ssl/。

Contrastive methods have led a recent surge in the performance of self-supervised representation learning (SSL). Recent methods like BYOL or SimSiam purportedly distill these contrastive methods down to their essence, removing bells and whistles, including the negative examples, that do not contribute to downstream performance. These "non-contrastive" methods work surprisingly well without using negatives even though the global minimum lies at trivial collapse. We empirically analyze these non-contrastive methods and find that SimSiam is extraordinarily sensitive to dataset and model size. In particular, SimSiam representations undergo partial dimensional collapse if the model is too small relative to the dataset size. We propose a metric to measure the degree of this collapse and show that it can be used to forecast the downstream task performance without any fine-tuning or labels. We further analyze architectural design choices and their effect on the downstream performance. Finally, we demonstrate that shifting to a continual learning setting acts as a regularizer and prevents collapse, and a hybrid between continual and multi-epoch training can improve linear probe accuracy by as many as 18 percentage points using ResNet-18 on ImageNet. Our project page is at https://alexanderli.com/noncontrastive-ssl/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题