论文标题
基于确定性的概括界限停止积极学习的标准
Stopping criterion for active learning based on deterministic generalization bounds
论文作者
论文摘要
主动学习是一个框架,学习机器可以选择用于培训的样本。这项技术是有希望的,尤其是当数据获取和标签的成本很高时。在积极学习中,确定应停止学习的时间是一个关键问题。在这项研究中,我们提出了一个自动停止主动学习的标准。提出的停止标准基于预期的概括误差和假设检验的差异。我们在获得基于Pac-Bayesian理论的新训练基准之前和之后,我们获得了一种新颖的上限,以实现预期泛化错误的差异。但是,与普通的Pac-bayesian界限不同,提议的界限是确定性的。因此,不平等的信心和紧密性之间没有无法控制的权衡。我们将上限与统计测试相结合,以得出一个积极学习的停止标准。我们通过使用人工和真实数据集的实验来证明该方法的有效性。
Active learning is a framework in which the learning machine can select the samples to be used for training. This technique is promising, particularly when the cost of data acquisition and labeling is high. In active learning, determining the timing at which learning should be stopped is a critical issue. In this study, we propose a criterion for automatically stopping active learning. The proposed stopping criterion is based on the difference in the expected generalization errors and hypothesis testing. We derive a novel upper bound for the difference in expected generalization errors before and after obtaining a new training datum based on PAC-Bayesian theory. Unlike ordinary PAC-Bayesian bounds, though, the proposed bound is deterministic; hence, there is no uncontrollable trade-off between the confidence and tightness of the inequality. We combine the upper bound with a statistical test to derive a stopping criterion for active learning. We demonstrate the effectiveness of the proposed method via experiments with both artificial and real datasets.