本地差异化的私人（上下文）匪徒学习

论文标题

本地差异化的私人（上下文）匪徒学习

Locally Differentially Private (Contextual) Bandits Learning

论文作者

Zheng, Kai, Cai, Tianle, Huang, Weiran, Li, Zhenguo, Wang, Liwei

论文摘要

我们在本文中研究本地差异性私人（LDP）土匪学习。首先，我们提出了简单的黑盒减少框架，可以通过使用LDP保证来解决大型无上下文的土匪学习问题。基于我们的框架，我们可以通过单点反馈（例如私人土匪凸优化）来改善私人匪徒学习的最佳结果，并在LDP下使用多点反馈获得土匪凸出优化（BCO）的第一个结果。与以前的专门设计且相对较弱的私有（DP）无上下文的强盗算法相比，LDP保证和黑盒自然可以使我们的框架在实际应用中更具吸引力。此外，我们将$（\ varepsilon，δ）$ - LDP算法扩展到广义的线性匪徒，该Bessare bistits享受了sublinear Rexear $ \ tilde {o}（t^{3/4}/\ varepsilon）$，并猜测是几乎最佳的。请注意，鉴于DP上下文线性土匪的现有$ω（t）$下限（Shariff＆Sheffe，2018），我们的结果显示了LDP和DP上下文匪徒学习之间的基本差异。

We study locally differentially private (LDP) bandits learning in this paper. First, we propose simple black-box reduction frameworks that can solve a large family of context-free bandits learning problems with LDP guarantee. Based on our frameworks, we can improve previous best results for private bandits learning with one-point feedback, such as private Bandits Convex Optimization, and obtain the first result for Bandits Convex Optimization (BCO) with multi-point feedback under LDP. LDP guarantee and black-box nature make our frameworks more attractive in real applications compared with previous specifically designed and relatively weaker differentially private (DP) context-free bandits algorithms. Further, we extend our $(\varepsilon, δ)$-LDP algorithm to Generalized Linear Bandits, which enjoys a sub-linear regret $\tilde{O}(T^{3/4}/\varepsilon)$ and is conjectured to be nearly optimal. Note that given the existing $Ω(T)$ lower bound for DP contextual linear bandits (Shariff & Sheffe, 2018), our result shows a fundamental difference between LDP and DP contextual bandits learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题