通过双重深网了解自我监督的学习

论文标题

通过双重深网了解自我监督的学习

Understanding Self-supervised Learning with Dual Deep Networks

论文作者

Tian, Yuandong, Yu, Lantao, Chen, Xinlei, Ganguli, Surya

论文摘要

我们提出了一个新颖的理论框架，以了解采用双对深层网络（例如SIMCLR）的对比对比的自我监督学习（SSL）方法。首先，我们证明，在每个SGD更新具有各种损失功能的SGD更新中，包括简单的对比度损失，软三重损失和信息损失，每一层的权重都由A \ Emph {协方差算子}更新，这些均具有特定放大的初始随机选择性，这些随机选择性在数据示例中却在数据示例中有所不同，但在数据增强方面生存了相对于数据增强。为了进一步研究协方差操作员在这样的过程中扮演什么角色，我们通过\ emph {层次结构潜在树模型}（HLTM）对数据生成和增强过程进行建模，并证明Deep Relu网络的隐藏神经元可以从HLTM中学习HLTM中的潜在变量，但没有从HLTM中学习\ emph的事实。变量。这通过对比度SSL扩增了最初的随机选择性，从而导致了层次特征的可证明出现。广泛的数值研究证明了我们的理论发现。代码在https://github.com/facebookresearch/luckmatters/tree/master/ssl中发布。

We propose a novel theoretical framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks (e.g., SimCLR). First, we prove that in each SGD update of SimCLR with various loss functions, including simple contrastive loss, soft Triplet loss and InfoNCE loss, the weights at each layer are updated by a \emph{covariance operator} that specifically amplifies initial random selectivities that vary across data samples but survive averages over data augmentations. To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a \emph{hierarchical latent tree model} (HLTM) and prove that the hidden neurons of deep ReLU networks can learn the latent variables in HLTM, despite the fact that the network receives \emph{no direct supervision} from these unobserved latent variables. This leads to a provable emergence of hierarchical features through the amplification of initially random selectivities through contrastive SSL. Extensive numerical studies justify our theoretical findings. Code is released in https://github.com/facebookresearch/luckmatters/tree/master/ssl.

下载PDF全文

下载文献需遵守相关版权规定

论文标题