非平稳广义线性匪徒的算法

论文标题

非平稳广义线性匪徒的算法

Algorithms for Non-Stationary Generalized Linear Bandits

论文作者

Russac, Yoan, Cappé, Olivier, Garivier, Aurélien

论文摘要

广义线性模型（GLM）的统计框架可以应用于涉及与单击，喜欢或评分相关的分类或序数奖励的顺序问题。在二进制奖励的示例中，逻辑回归众所周知，比使用标准线性建模更可取。先前的工作已经显示了如何在环境固定的环境时使用强盗反馈在上下文的在线学习中处理GLM。在本文中，我们放宽了后一个假设，并提出了两种基于置信的算法，这些算法使用滑动窗口或打折的最大样本估计器。我们为这些算法的行为提供了理论保证，并在存在突然变化的情况下为一般上下文序列提供了理论保证。这些结果以D^2/3 G^1/3 T^2/3顺序的高概率上限的形式，其中D，T和G分别是未知参数的维度，回合的数量和时间t的断点数量。在模拟环境中显示了算法的经验性能。

The statistical framework of Generalized Linear Models (GLM) can be applied to sequential problems involving categorical or ordinal rewards associated, for instance, with clicks, likes or ratings. In the example of binary rewards, logistic regression is well-known to be preferable to the use of standard linear modeling. Previous works have shown how to deal with GLMs in contextual online learning with bandit feedback when the environment is assumed to be stationary. In this paper, we relax this latter assumption and propose two upper confidence bound based algorithms that make use of either a sliding window or a discounted maximum-likelihood estimator. We provide theoretical guarantees on the behavior of these algorithms for general context sequences and in the presence of abrupt changes. These results take the form of high probability upper bounds for the dynamic regret that are of order d^2/3 G^1/3 T^2/3 , where d, T and G are respectively the dimension of the unknown parameter, the number of rounds and the number of breakpoints up to time T. The empirical performance of the algorithms is illustrated in simulated environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题