与手臂依赖延迟的随机匪徒

论文标题

与手臂依赖延迟的随机匪徒

Stochastic bandits with arm-dependent delays

论文作者

Manegueu, Anne Gael, Vernade, Claire, Carpentier, Alexandra, Valko, Michal

论文摘要

由于其在应用中的相关性，最近已致力于随机延迟的匪徒的大量工作。然而，现有算法的适用性受到以下事实的限制：强有力的假设通常是对延迟分布（例如完全可观察性，限制性形状限制或对武器的均匀性）的限制。在这项工作中，我们大大削弱了它们，只假设延迟的尾巴上有一个绑定。特别是，我们介绍了延迟分布在各种武器之间变化的重要情况，以及延误是重尾的情况。在解决这些困难时，我们提出了一种简单但有效的基于UCB的算法，称为DivateBandits。我们在遗憾和绩效下界提供依赖问题和与问题无关的界限。

Significant work has been recently dedicated to the stochastic delayed bandit setting because of its relevance in applications. The applicability of existing algorithms is however restricted by the fact that strong assumptions are often made on the delay distributions, such as full observability, restrictive shape constraints, or uniformity over arms. In this work, we weaken them significantly and only assume that there is a bound on the tail of the delay. In particular, we cover the important case where the delay distributions vary across arms, and the case where the delays are heavy-tailed. Addressing these difficulties, we propose a simple but efficient UCB-based algorithm called the PatientBandits. We provide both problems-dependent and problems-independent bounds on the regret as well as performance lower bounds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题