模型错误指定对贝叶斯土匪的影响：UX优化中的案例研究

论文标题

模型错误指定对贝叶斯土匪的影响：UX优化中的案例研究

Effects of Model Misspecification on Bayesian Bandits: Case Studies in UX Optimization

论文作者

Sweeney, Mack, van Adelsberg, Matthew, Laskey, Kathryn, Domeniconi, Carlotta

论文摘要

近年来，使用汤普森采样的贝叶斯匪徒的成功越来越大。然而，现有的价值模型（奖励）在许多现实世界中都被误解了。我们在用户体验优化（UXO）问题上证明了这一点，并提供了一种新颖的配方，作为一种不安的，睡眠的强盗，没有观察到的混杂器加上可选的停止。我们的案例研究表明，常见的拼写错误如何导致次优奖励，并且我们提供模型扩展以解决这些奖励，以及科学的模型构建过程从业人员可以采用或适应解决自己独特的问题。据我们所知，这是第一项研究表明过度分散对匪徒探索/利用功效的影响，分别将不信任和过度自信的常见概念与过度探索和探索的概念联系在一起。我们还提出了第一个在不安的强盗中利用协整的模型，这表明有限的遗憾，快速，一致的可选停止是通过超越简单的窗户，折扣和漂移模型的移动而进行的。

Bayesian bandits using Thompson Sampling have seen increasing success in recent years. Yet existing value models (of rewards) are misspecified on many real-world problem. We demonstrate this on the User Experience Optimization (UXO) problem, providing a novel formulation as a restless, sleeping bandit with unobserved confounders plus optional stopping. Our case studies show how common misspecifications can lead to sub-optimal rewards, and we provide model extensions to address these, along with a scientific model building process practitioners can adopt or adapt to solve their own unique problems. To our knowledge, this is the first study showing the effects of overdispersion on bandit explore/exploit efficacy, tying the common notions of under- and over-confidence to over- and under-exploration, respectively. We also present the first model to exploit cointegration in a restless bandit, demonstrating that finite regret and fast and consistent optional stopping are possible by moving beyond simpler windowing, discounting, and drift models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题