论文标题
自我监督对应学习的对比度转变
Contrastive Transformation for Self-supervised Correspondence Learning
论文作者
论文摘要
在本文中,我们专注于使用野外未标记的视频对视觉对应的学习。我们的方法同时考虑了可靠的对应关系估计的视频内和胶水间表示关联。视频内的学习通过框架配对亲和力在单个视频中跨帧的图像内容转换。为了获得实例级别分离的判别性表示,我们超越了视频内分析并构建视频间亲和力,以促进不同视频之间的对比度转换。通过迫使视频内和视频间水平之间的转换一致性,可以很好地保存细粒的对应关系,并有效地加强了实例级级特征歧视。我们的简单框架优于一系列视觉任务的最新自我监督的对应方法,包括视频对象跟踪(vot),视频对象细分(VOS),姿势关键点跟踪等。值得一提的是,我们的方法还超过了全面的竞争性(例如,竞争竞争性的竞争范围),以超过全面的竞争性(例如,竞争性)(例如,竞争性地为特定的任务(例如) vot和vos)。
In this paper, we focus on the self-supervised learning of visual correspondence using unlabeled videos in the wild. Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation. The intra-video learning transforms the image contents across frames within a single video via the frame pair-wise affinity. To obtain the discriminative representation for instance-level separation, we go beyond the intra-video analysis and construct the inter-video affinity to facilitate the contrastive transformation across different videos. By forcing the transformation consistency between intra- and inter-video levels, the fine-grained correspondence associations are well preserved and the instance-level feature discrimination is effectively reinforced. Our simple framework outperforms the recent self-supervised correspondence methods on a range of visual tasks including video object tracking (VOT), video object segmentation (VOS), pose keypoint tracking, etc. It is worth mentioning that our method also surpasses the fully-supervised affinity representation (e.g., ResNet) and performs competitively against the recent fully-supervised algorithms designed for the specific tasks (e.g., VOT and VOS).