论文标题
多尺度非平稳随机匪徒
Multiscale Non-stationary Stochastic Bandits
论文作者
论文摘要
线性模型(例如Linucb)的经典上下文匪徒算法假设,ARM的奖励分布是由固定线性回归建模的。当线性回归模型随着时间的流逝而不是平稳时,Linucb的遗憾可以随时间线性扩展。在本文中,我们提出了一种新型的多尺度更改点检测方法,用于非平稳线性匪徒问题,称为多尺度linucb,该方法积极适应不断变化的环境。我们还提供了对多尺度linucb算法束缚的遗憾的理论分析。实验结果表明,我们提出的多尺度算法在非平稳上下文环境中优于其他最先进的算法。
Classic contextual bandit algorithms for linear models, such as LinUCB, assume that the reward distribution for an arm is modeled by a stationary linear regression. When the linear regression model is non-stationary over time, the regret of LinUCB can scale linearly with time. In this paper, we propose a novel multiscale changepoint detection method for the non-stationary linear bandit problems, called Multiscale-LinUCB, which actively adapts to the changing environment. We also provide theoretical analysis of regret bound for Multiscale-LinUCB algorithm. Experimental results show that our proposed Multiscale-LinUCB algorithm outperforms other state-of-the-art algorithms in non-stationary contextual environments.