论文标题
具有当地差异隐私的多军匪徒
Multi-Armed Bandits with Local Differential Privacy
论文作者
论文摘要
本文研究了遗憾的最小化多臂强盗(MAB)问题的问题,并提供了当地差异隐私(LDP)保证的问题。在随机匪徒系统中,奖励可能是指用户的活动,这可能涉及私人信息,用户可能不希望代理商知道。但是,在许多情况下,代理商需要了解这些活动,以提供更好的服务,例如建议和新闻提要。为了应对这一难题,我们采用差异隐私,并通过给定的LDP保证来研究MAB算法的遗憾上限和下限。在本文中,我们证明了一个下限,并提出了算法,其遗憾的上限与下限与恒定因素相匹配。数值实验也证实了我们的结论。
This paper investigates the problem of regret minimization for multi-armed bandit (MAB) problems with local differential privacy (LDP) guarantee. In stochastic bandit systems, the rewards may refer to the users' activities, which may involve private information and the users may not want the agent to know. However, in many cases, the agent needs to know these activities to provide better services such as recommendations and news feeds. To handle this dilemma, we adopt differential privacy and study the regret upper and lower bounds for MAB algorithms with a given LDP guarantee. In this paper, we prove a lower bound and propose algorithms whose regret upper bounds match the lower bound up to constant factors. Numerical experiments also confirm our conclusions.