论文标题
形状约束对阈值匪徒问题的影响
The Influence of Shape Constraints on the Thresholding Bandit Problem
论文作者
论文摘要
我们在几个形状约束下研究了随机阈值匪徒问题(TBP)。除了(i)香草,非结构化的tbp之外,我们考虑了(ii)ARM的顺序表示$(μ_k)_k $单调地增加了MTBP,(iii)$(μ_k)_K $是un -imodal utbp and(iv)的情况。在TBP问题中,目的是在顺序游戏结束时输出一组均值高于给定阈值的武器。遗憾是错误分类的手臂与阈值之间的最高差距。在固定的预算设置中,我们为所有环境中的预期遗憾以及相关算法提供了无关的最小值费率。我们证明,遗憾的最小值是(i)$ \ sqrt {\ log(k)k/t} $ for tbp,(ii)$ \ sqrt {\ sqrt {\ log(k)/t} $ mtbp for mtbp for mtbp,(iii)$ \ sqrt {k/t} $ for utbp and(k/t} CTBP的K/T} $,其中$ k $是武器数,$ t $是预算。这些速率表明,对Minimax遗憾的$ K $的依赖性取决于形状约束。这突出了一个事实,即形状约束,从根本上修改了TBP的性质。
We investigate the stochastic Thresholding Bandit problem (TBP) under several shape constraints. On top of (i) the vanilla, unstructured TBP, we consider the case where (ii) the sequence of arm's means $(μ_k)_k$ is monotonically increasing MTBP, (iii) the case where $(μ_k)_k$ is unimodal UTBP and (iv) the case where $(μ_k)_k$ is concave CTBP. In the TBP problem the aim is to output, at the end of the sequential game, the set of arms whose means are above a given threshold. The regret is the highest gap between a misclassified arm and the threshold. In the fixed budget setting, we provide problem independent minimax rates for the expected regret in all settings, as well as associated algorithms. We prove that the minimax rates for the regret are (i) $\sqrt{\log(K)K/T}$ for TBP, (ii) $\sqrt{\log(K)/T}$ for MTBP, (iii) $\sqrt{K/T}$ for UTBP and (iv) $\sqrt{\log\log K/T}$ for CTBP, where $K$ is the number of arms and $T$ is the budget. These rates demonstrate that the dependence on $K$ of the minimax regret varies significantly depending on the shape constraint. This highlights the fact that the shape constraints modify fundamentally the nature of the TBP.