最佳的消除算法，用于学习最佳手臂

论文标题

最佳的消除算法，用于学习最佳手臂

An Optimal Elimination Algorithm for Learning a Best Arm

论文作者

Hassidim, Avinatan, Kupfer, Ron, Singer, Yaron

论文摘要

我们考虑$（ε，δ）$ - PAC的经典问题，学习一个最佳手臂，其目标是置信度$ 1-δ$，其平均值为$ε$ - apptroximation to Multimed Bandit设置中平均值最高的手臂。这个问题是统计和学习理论中最根本的问题之一，但令人惊讶的是，其最糟糕的样本复杂性尚不清楚。在本文中，我们提出了一种新的方法（ε，δ）$ - PAC学习最好的手臂。这种方法导致了一种算法，其样品复杂性将其收敛到\ emph {}} $（ε，δ）$的最佳样品复杂性分别学习$ n $臂的平均值，我们通过有条件的匹配下限来补充这一结果。更具体地：

We consider the classic problem of $(ε,δ)$-PAC learning a best arm where the goal is to identify with confidence $1-δ$ an arm whose mean is an $ε$-approximation to that of the highest mean arm in a multi-armed bandit setting. This problem is one of the most fundamental problems in statistics and learning theory, yet somewhat surprisingly its worst-case sample complexity is not well understood. In this paper, we propose a new approach for $(ε,δ)$-PAC learning a best arm. This approach leads to an algorithm whose sample complexity converges to \emph{exactly} the optimal sample complexity of $(ε,δ)$-learning the mean of $n$ arms separately and we complement this result with a conditional matching lower bound. More specifically:

下载PDF全文

下载文献需遵守相关版权规定

论文标题