论文标题
最佳的两臂土匪中最佳最佳武器识别,在较小的差距下有固定的预算
Optimal Best Arm Identification in Two-Armed Bandits with a Fixed Budget under a Small Gap
论文作者
论文摘要
我们考虑在两臂高斯强盗问题中固定预算最佳武器识别。长期以来的开放问题之一是存在一种最佳策略,在该策略下,错误识别的可能性与下限相匹配。我们表明,当预期奖励之间的差距很小时,遵循Neyman分配规则(Neyman,1934)的策略在渐近上是最佳的。首先,我们回顾了Kaufmann等人得出的下限。 (2016)。然后,我们提出了“ Neyman分配(NA)的逆概率加权(AIPW)”策略,该策略由使用Neyman分配的采样规则组成,并使用估计的标准偏差和使用AIPW估计器的建议规则组成。我们提出的策略是最佳的,因为当预算输入无限时,上限与下限匹配,并且差距为零。
We consider fixed-budget best-arm identification in two-armed Gaussian bandit problems. One of the longstanding open questions is the existence of an optimal strategy under which the probability of misidentification matches a lower bound. We show that a strategy following the Neyman allocation rule (Neyman, 1934) is asymptotically optimal when the gap between the expected rewards is small. First, we review a lower bound derived by Kaufmann et al. (2016). Then, we propose the "Neyman Allocation (NA)-Augmented Inverse Probability weighting (AIPW)" strategy, which consists of the sampling rule using the Neyman allocation with an estimated standard deviation and the recommendation rule using an AIPW estimator. Our proposed strategy is optimal because the upper bound matches the lower bound when the budget goes to infinity and the gap goes to zero.