论文标题
PAL:基于借口的主动学习
PAL : Pretext-based Active Learning
论文作者
论文摘要
基于池的主动学习的目标是明智地从池中选择一个固定尺寸的未标记样本子集,以查询甲骨文的标签,以最大程度地提高监督学习者的准确性。但是,对于大多数情况,甲骨文始终应分配正确的标签是不合理的。我们为深层神经网络提供了一种主动学习技术,该技术比以前提出的技术更稳健地标记。以前的技术依靠任务网络本身来估计未标记样本的新颖性,但是学习任务(概括)和选择样本(分布式检测)可能是矛盾的目标。我们使用单独的网络为未标记的样本评分进行选择。评分网络依赖于自我划分来对标记样品的分布进行建模,以减少对潜在嘈杂标签的依赖性。为了应对数据的匮乏,我们还通过多任务学习在评分网络上部署了另一个头部,并使用异常的自动平衡混合评分函数。此外,在标记之前,我们将每个查询分为子查询,以确保查询具有不同的样本。除了对甲骨文对样品的标签较高的耐受性外,在没有标签噪声的情况下,所得技术还产生竞争精度。该技术还通过暂时提高这些类别的采样率来处理新类的引入。
The goal of pool-based active learning is to judiciously select a fixed-sized subset of unlabeled samples from a pool to query an oracle for their labels, in order to maximize the accuracy of a supervised learner. However, the unsaid requirement that the oracle should always assign correct labels is unreasonable for most situations. We propose an active learning technique for deep neural networks that is more robust to mislabeling than the previously proposed techniques. Previous techniques rely on the task network itself to estimate the novelty of the unlabeled samples, but learning the task (generalization) and selecting samples (out-of-distribution detection) can be conflicting goals. We use a separate network to score the unlabeled samples for selection. The scoring network relies on self-supervision for modeling the distribution of the labeled samples to reduce the dependency on potentially noisy labels. To counter the paucity of data, we also deploy another head on the scoring network for regularization via multi-task learning and use an unusual self-balancing hybrid scoring function. Furthermore, we divide each query into sub-queries before labeling to ensure that the query has diverse samples. In addition to having a higher tolerance to mislabeling of samples by the oracle, the resultant technique also produces competitive accuracy in the absence of label noise. The technique also handles the introduction of new classes on-the-fly well by temporarily increasing the sampling rate of these classes.