关于基于池的主动分类和虚假发现控制的新观点

论文标题

关于基于池的主动分类和虚假发现控制的新观点

A New Perspective on Pool-Based Active Classification and False-Discovery Control

论文作者

Jain, Lalit, Jamieson, Kevin

论文摘要

在许多科学环境中，需要自适应实验设计来指导识别搜索空间区域的过程，这些搜索空间中包含尽可能多的真实阳性，这可能会受到较低的错误发现率（即错误警报）。搜索空间的此类区域可能与预测的集合截然不同，该集合可以最大程度地减少0/1误差，准确的识别可能需要非常不同的采样策略。像主动学习进行二进制分类一样，这种实验设计不能先优先选择，而是必须依次和适应数据。但是，与0/1错误的分类不同，可以自适应地收集数据以找到具有很高真实正率和低的错误发现率（FDR）的集合。在本文中，我们为此问题提供了第一个可证明的有效自适应算法。在此过程中，我们突出显示了分类，组合匪徒和FDR控制之间的连接。

In many scientific settings there is a need for adaptive experimental design to guide the process of identifying regions of the search space that contain as many true positives as possible subject to a low rate of false discoveries (i.e. false alarms). Such regions of the search space could differ drastically from a predicted set that minimizes 0/1 error and accurate identification could require very different sampling strategies. Like active learning for binary classification, this experimental design cannot be optimally chosen a priori, but rather the data must be taken sequentially and adaptively. However, unlike classification with 0/1 error, collecting data adaptively to find a set with high true positive rate and low false discovery rate (FDR) is not as well understood. In this paper we provide the first provably sample efficient adaptive algorithm for this problem. Along the way we highlight connections between classification, combinatorial bandits, and FDR control making contributions to each.

下载PDF全文

下载文献需遵守相关版权规定

论文标题