论文标题
高维随机线性上下文匪徒缺少协变量
High dimensional stochastic linear contextual bandit with missing covariates
论文作者
论文摘要
匪徒问题的最新作品在顺序决策环境中采用了拉索融合理论。即使有完全观察到的上下文,也存在一些技术挑战,阻碍了现有的套索融合理论的应用:1)证明在有条件的高斯噪声下的受限特征值条件和2)考虑上下文变量与所选动作之间的依赖性。本文研究了缺失协变量对随机线性匪徒遗憾的影响。我们的工作为拟议算法在协变量抽样概率方面所产生的遗憾提供了高概率的上限,这表明,由于$ζ_{min}^2 $的缺失而导致的遗憾降低,其中$ζ_{min} $是观察costerving concovariales中的最小可能性。我们说明了通过连续选择的类别区分DNA探针来收集基因表达数据的实验设计的实际应用算法。
Recent works in bandit problems adopted lasso convergence theory in the sequential decision-making setting. Even with fully observed contexts, there are technical challenges that hinder the application of existing lasso convergence theory: 1) proving the restricted eigenvalue condition under conditionally sub-Gaussian noise and 2) accounting for the dependence between the context variables and the chosen actions. This paper studies the effect of missing covariates on regret for stochastic linear bandit algorithms. Our work provides a high-probability upper bound on the regret incurred by the proposed algorithm in terms of covariate sampling probabilities, showing that the regret degrades due to missingness by at most $ζ_{min}^2$, where $ζ_{min}$ is the minimum probability of observing covariates in the context vector. We illustrate our algorithm for the practical application of experimental design for collecting gene expression data by a sequential selection of class discriminating DNA probes.