论文标题
通过汤普森采样的变量选择
Variable Selection via Thompson Sampling
论文作者
论文摘要
汤普森采样是一种多军强盗问题的启发式算法,它在机器学习方面具有悠久的传统。该算法具有贝叶斯精神,因为它根据每只手臂的奖励概率的后部样本选择了手臂。通过在组合二进制匪徒和Spike-slab变量选择之间建立连接,我们提出了一种随机优化方法,用于子集选择,称为Thompson变量选择(TVS)。 TVS是可解释的机器学习的框架,不依赖基础模型是线性的。 TVS汇集了贝叶斯的加固和机器学习,以便将贝叶斯子集选择的范围扩展到非参数模型和具有许多预测指标和/或很多观察结果的大型数据集。根据奖励的选择,电视可以在离线和通过流数据批次的在线设置中部署。将乘数匪徒定制为可变选择,我们提供遗憾的界限,而不必假设手臂均值奖励是无关的。我们在模拟和真实数据上都表现出非常强大的经验性能。与Spike和Slab变量选择的确定性优化方法不同,随机性质使电视不容易容易局部收敛,从而更加健壮。
Thompson sampling is a heuristic algorithm for the multi-armed bandit problem which has a long tradition in machine learning. The algorithm has a Bayesian spirit in the sense that it selects arms based on posterior samples of reward probabilities of each arm. By forging a connection between combinatorial binary bandits and spike-and-slab variable selection, we propose a stochastic optimization approach to subset selection called Thompson Variable Selection (TVS). TVS is a framework for interpretable machine learning which does not rely on the underlying model to be linear. TVS brings together Bayesian reinforcement and machine learning in order to extend the reach of Bayesian subset selection to non-parametric models and large datasets with very many predictors and/or very many observations. Depending on the choice of a reward, TVS can be deployed in offline as well as online setups with streaming data batches. Tailoring multiplay bandits to variable selection, we provide regret bounds without necessarily assuming that the arm mean rewards be unrelated. We show a very strong empirical performance on both simulated and real data. Unlike deterministic optimization methods for spike-and-slab variable selection, the stochastic nature makes TVS less prone to local convergence and thereby more robust.