通过信息阈值在高斯工艺匪徒中的遗憾和信念复杂性权衡

论文标题

通过信息阈值在高斯工艺匪徒中的遗憾和信念复杂性权衡

Regret and Belief Complexity Trade-off in Gaussian Process Bandits via Information Thresholding

论文作者

Bedi, Amrit Singh, Peddireddy, Dheeraj, Aggarwal, Vaneet, Sadler, Brian M., Koppel, Alec

论文摘要

贝叶斯优化是通过最大后验更新而不是模拟退火的全球搜索框架，并且在不确定性下的决策变得突出。在这项工作中，我们将贝叶斯优化作为多臂强盗问题进行，其中收益功能是从高斯工艺（GP）中采样的。此外，由于其在实践中的普遍使用，我们专注于通过上置信度结合（UCB）或预期改进（EI）的行动选择。先前使用GPS用于匪徒的工作不允许迭代范围$ t $很大，因为计算后验参数的复杂性随着过去的观测值的数量立方缩放。为了避免这种计算负担，我们提出了一个简单的统计测试：仅当其条件熵超过$ε$阈值时，才将动作纳入GP后部。这样做可以使我们精确地表征GP Bandit算法的后悔界限与后分布的复杂性之间的权衡，具体取决于离散和连续动作集的压缩参数$ε$。据我们所知，这是第一个结果，它使我们能够获得透明性的遗憾界限，同时仍保持后代复杂性的均方根增长速度，这在现有文献中是线性的。此外，可以实现对复杂性的有限限制，但是算法将导致$ε$ -REGRET，这意味着$ \ textbf {reg} _t/t/t/t \ rightArrow \ Mathcal \ Mathcal {o}（O}（O}（O}）（ε）$作为$ t \ rightArrow \ rightArrow \ rightarrow \ infty $。在实验上，我们观察到用于全球优化的GP Bandit算法的最佳精度和复杂性权衡状态，这表明压缩GP在强盗设置中的优点。

Bayesian optimization is a framework for global search via maximum a posteriori updates rather than simulated annealing, and has gained prominence for decision-making under uncertainty. In this work, we cast Bayesian optimization as a multi-armed bandit problem, where the payoff function is sampled from a Gaussian process (GP). Further, we focus on action selections via upper confidence bound (UCB) or expected improvement (EI) due to their prevalent use in practice. Prior works using GPs for bandits cannot allow the iteration horizon $T$ to be large, as the complexity of computing the posterior parameters scales cubically with the number of past observations. To circumvent this computational burden, we propose a simple statistical test: only incorporate an action into the GP posterior when its conditional entropy exceeds an $ε$ threshold. Doing so permits us to precisely characterize the trade-off between regret bounds of GP bandit algorithms and complexity of the posterior distributions depending on the compression parameter $ε$ for both discrete and continuous action sets. To best of our knowledge, this is the first result which allows us to obtain sublinear regret bounds while still maintaining sublinear growth rate of the complexity of the posterior which is linear in the existing literature. Moreover, a provably finite bound on the complexity could be achieved but the algorithm would result in $ε$-regret which means $\textbf{Reg}_T/T \rightarrow \mathcal{O}(ε)$ as $T\rightarrow \infty$. Experimentally, we observe state of the art accuracy and complexity trade-offs for GP bandit algorithms applied to global optimization, suggesting the merits of compressed GPs in bandit settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题