论文标题

帕累托优化用于在分发数据方面的积极学习

Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

论文作者

Zhan, Xueying, Dai, Zeyu, Wang, Qingzhong, Li, Qing, Xiong, Haoyi, Dou, Dejing, Chan, Antoni B.

论文摘要

基于池的主动学习(AL)通过依次从大型未标记的数据池中选择信息的未标记样本并从Oracle/注释者查询标签,从而取得了巨大成功。但是,现有的AL采样策略可能在分布(OOD)数据方案中无法很好地工作,其中未标记的数据池包含一些不属于目标任务类别的数据示例。在OOD数据情景下实现良好的AL性能是一项具有挑战性的任务,因为Al采样策略与OOD样本检测之间的自然冲突。 Al选择很难由当前基本分类器进行分类的数据(例如,预测类概率具有较高熵的样品),而OOD样本往往比分布(ID)数据具有更均匀的预测类概率(即高熵)。在本文中,我们提出了一种采样方案,即用于主动学习的蒙特 - 卡洛帕累托优化(POAL),该方案从未标记的数据库中选择了具有固定批次大小的未标记样品的最佳子集。我们将AL采样任务施加为多目标优化问题,因此我们基于两个冲突的目标利用帕累托优化:(1)正常的AL数据采样方案(例如,最大熵)和(2)不作为OOD样本的信心。实验结果表明其对经典机器学习(ML)和深度学习(DL)任务的有效性。

Pool-based Active Learning (AL) has achieved great success in minimizing labeling cost by sequentially selecting informative unlabeled samples from a large unlabeled data pool and querying their labels from oracle/annotators. However, existing AL sampling strategies might not work well in out-of-distribution (OOD) data scenarios, where the unlabeled data pool contains some data samples that do not belong to the classes of the target task. Achieving good AL performance under OOD data scenarios is a challenging task due to the natural conflict between AL sampling strategies and OOD sample detection. AL selects data that are hard to be classified by the current basic classifier (e.g., samples whose predicted class probabilities have high entropy), while OOD samples tend to have more uniform predicted class probabilities (i.e., high entropy) than in-distribution (ID) data. In this paper, we propose a sampling scheme, Monte-Carlo Pareto Optimization for Active Learning (POAL), which selects optimal subsets of unlabeled samples with fixed batch size from the unlabeled data pool. We cast the AL sampling task as a multi-objective optimization problem, and thus we utilize Pareto optimization based on two conflicting objectives: (1) the normal AL data sampling scheme (e.g., maximum entropy), and (2) the confidence of not being an OOD sample. Experimental results show its effectiveness on both classical Machine Learning (ML) and Deep Learning (DL) tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源