论文标题
对比度学习
Debiased Contrastive Learning
论文作者
论文摘要
一种自我监督的表示学习的突出技术是对比在语义上相似和不同的样本对。如果不访问标签,通常认为不同(负)点是随机采样数据点,隐含地接受这些点实际上可能具有相同的标签。也许毫不奇怪,我们观察到,在可用标签的合成环境中,从真正不同的标签中抽样的负面示例可以改善性能。在这一观察过程中,我们开发了一个具有依据的对比目标,即使在不了解真实标签的情况下,也可以纠正同一标签数据点的采样。从经验上讲,提议的目标始终优于在视觉,语言和强化学习基准中表示的代表性学习的最先进。从理论上讲,我们为下游分类任务建立了概括范围。
A prominent technique for self-supervised representation learning has been to contrast semantically similar and dissimilar pairs of samples. Without access to labels, dissimilar (negative) points are typically taken to be randomly sampled datapoints, implicitly accepting that these points may, in reality, actually have the same label. Perhaps unsurprisingly, we observe that sampling negative examples from truly different labels improves performance, in a synthetic setting where labels are available. Motivated by this observation, we develop a debiased contrastive objective that corrects for the sampling of same-label datapoints, even without knowledge of the true labels. Empirically, the proposed objective consistently outperforms the state-of-the-art for representation learning in vision, language, and reinforcement learning benchmarks. Theoretically, we establish generalization bounds for the downstream classification task.