论文标题
本地差异化的私人(上下文)匪徒学习
Locally Differentially Private (Contextual) Bandits Learning
论文作者
论文摘要
我们在本文中研究本地差异性私人(LDP)土匪学习。首先,我们提出了简单的黑盒减少框架,可以通过使用LDP保证来解决大型无上下文的土匪学习问题。基于我们的框架,我们可以通过单点反馈(例如私人土匪凸优化)来改善私人匪徒学习的最佳结果,并在LDP下使用多点反馈获得土匪凸出优化(BCO)的第一个结果。与以前的专门设计且相对较弱的私有(DP)无上下文的强盗算法相比,LDP保证和黑盒自然可以使我们的框架在实际应用中更具吸引力。此外,我们将$(\ varepsilon,δ)$ - LDP算法扩展到广义的线性匪徒,该Bessare bistits享受了sublinear Rexear $ \ tilde {o}(t^{3/4}/\ varepsilon)$,并猜测是几乎最佳的。请注意,鉴于DP上下文线性土匪的现有$ω(t)$下限(Shariff&Sheffe,2018),我们的结果显示了LDP和DP上下文匪徒学习之间的基本差异。
We study locally differentially private (LDP) bandits learning in this paper. First, we propose simple black-box reduction frameworks that can solve a large family of context-free bandits learning problems with LDP guarantee. Based on our frameworks, we can improve previous best results for private bandits learning with one-point feedback, such as private Bandits Convex Optimization, and obtain the first result for Bandits Convex Optimization (BCO) with multi-point feedback under LDP. LDP guarantee and black-box nature make our frameworks more attractive in real applications compared with previous specifically designed and relatively weaker differentially private (DP) context-free bandits algorithms. Further, we extend our $(\varepsilon, δ)$-LDP algorithm to Generalized Linear Bandits, which enjoys a sub-linear regret $\tilde{O}(T^{3/4}/\varepsilon)$ and is conjectured to be nearly optimal. Note that given the existing $Ω(T)$ lower bound for DP contextual linear bandits (Shariff & Sheffe, 2018), our result shows a fundamental difference between LDP and DP contextual bandits learning.