论文标题
计算机视觉中的自我监督的上下文强盗
Self-Supervised Contextual Bandits in Computer Vision
论文作者
论文摘要
上下文匪徒是机器学习从业人员在域上的常见问题,例如假设测试对产品建议的多样化。在利用丰富的数据表示方面,有很多方法来解决具有不同成功程度的上下文匪徒问题。自我监督学习是一种有前途的方法,可以在没有明确标签的情况下找到丰富的数据表示。在典型的自学学习方案中,主要任务是由问题目标(例如聚类,分类,嵌入生成等)定义的,并且次要任务由自学的目标(例如旋转预测,邻域中的单词,颜色等)定义。在通常的自学意义上,我们从培训数据中学习隐式标签以进行次要任务。但是,在上下文的强盗设置中,由于在学习的初始阶段缺乏数据,我们没有获得隐式标签的优势。我们通过将上下文的强盗目标与自我监督目标相结合,提供了一种新颖的方法来解决这个问题。通过自我统治增强上下文匪徒学习,我们获得了更好的累积奖励。我们在八个流行的计算机视觉数据集上的结果显示了累积奖励的可观收益。我们提供了建议的方案无法表现最佳的情况,并提供了在这些情况下提供更好学习的替代方法。
Contextual bandits are a common problem faced by machine learning practitioners in domains as diverse as hypothesis testing to product recommendations. There have been a lot of approaches in exploiting rich data representations for contextual bandit problems with varying degree of success. Self-supervised learning is a promising approach to find rich data representations without explicit labels. In a typical self-supervised learning scheme, the primary task is defined by the problem objective (e.g. clustering, classification, embedding generation etc.) and the secondary task is defined by the self-supervision objective (e.g. rotation prediction, words in neighborhood, colorization, etc.). In the usual self-supervision, we learn implicit labels from the training data for a secondary task. However, in the contextual bandit setting, we don't have the advantage of getting implicit labels due to lack of data in the initial phase of learning. We provide a novel approach to tackle this issue by combining a contextual bandit objective with a self supervision objective. By augmenting contextual bandit learning with self-supervision we get a better cumulative reward. Our results on eight popular computer vision datasets show substantial gains in cumulative reward. We provide cases where the proposed scheme doesn't perform optimally and give alternative methods for better learning in these cases.