论文标题
在对比学习中,更多的负样本一定会受到伤害吗?
Do More Negative Samples Necessarily Hurt in Contrastive Learning?
论文作者
论文摘要
噪声对比估计的最新研究表明,从经验上讲,从理论上讲,尽管在对比性损失中拥有更多的“负样本”,但最初在阈值中提高了下游分类的性能,但由于“碰撞覆盖”的权衡,它会损害下游性能。但是,对比度学习中固有的现象是如此吗?我们在一个简单的理论环境中显示,通过从潜在的潜在类采样(由Saunshi等人引入(ICML,2019年))来产生正对,表明表示(种群)对比度损失的下游性能实际上并未随着负样本的数量降低。一路上,我们在框架中对最佳表示形式进行结构表征,以进行噪声对比估计。我们还为CIFAR-10和CIFAR-100数据集的理论结果提供了经验支持。
Recent investigations in noise contrastive estimation suggest, both empirically as well as theoretically, that while having more "negative samples" in the contrastive loss improves downstream classification performance initially, beyond a threshold, it hurts downstream performance due to a "collision-coverage" trade-off. But is such a phenomenon inherent in contrastive learning? We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class (introduced by Saunshi et al. (ICML 2019)), that the downstream performance of the representation optimizing the (population) contrastive loss in fact does not degrade with the number of negative samples. Along the way, we give a structural characterization of the optimal representation in our framework, for noise contrastive estimation. We also provide empirical support for our theoretical results on CIFAR-10 and CIFAR-100 datasets.