论文标题
耐腐败的高斯工艺匪徒的强大分阶段消除算法
A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits
论文作者
论文摘要
我们考虑了从嘈杂和对抗性损坏的观察到的奖励来评估奖励功能的未知,连续和昂贵的顺序优化。当腐败攻击受到适当的预算$ c $,并且该功能生存在复制的内核希尔伯特空间(RKHS)中时,问题可以作为损坏的高斯流程(GP)强盗优化。我们提出了一种新型的强大消除型算法,该算法在时期运行,将探索与不经常切换相结合以选择一小部分动作,并在多个时刻播放每个动作。我们的算法,强大的GP分阶段消除(RGP-PE)成功地使鲁棒性与腐败与勘探和剥削之间的平衡,使其在对抗性腐败的存在(或缺席)的情况下,其性能降低了。当$ t $是样本的数量和$γ_T$是最大信息增益时,我们遗憾的与损坏相关的术语是$ O(Cγ_T^{3/2})$,它比现有$ O(C \ sqrt {tγ_T})$明显更严格。我们对损坏的GP土匪环境中的鲁棒性进行了首次实证研究,并表明我们的算法对各种对抗性攻击是强大的。
We consider the sequential optimization of an unknown, continuous, and expensive to evaluate reward function, from noisy and adversarially corrupted observed rewards. When the corruption attacks are subject to a suitable budget $C$ and the function lives in a Reproducing Kernel Hilbert Space (RKHS), the problem can be posed as corrupted Gaussian process (GP) bandit optimization. We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants. Our algorithm, Robust GP Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation such that its performance degrades minimally in the presence (or absence) of adversarial corruptions. When $T$ is the number of samples and $γ_T$ is the maximal information gain, the corruption-dependent term in our regret bound is $O(C γ_T^{3/2})$, which is significantly tighter than the existing $O(C \sqrt{T γ_T})$ for several commonly-considered kernels. We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.