论文标题
差异私人设定联盟
Differentially Private Set Union
论文作者
论文摘要
我们在全球差异隐私模型中研究了集合的基本操作。在此问题中,我们将获得一个宇宙$ u $的物品,可能是无限大小的,以及一个数据库$ d $的用户。每个用户$ i $贡献一个项目的子集$ w_i \ subseteq u $。我们想要一个($ε$,$δ$) - 差异化私有算法,该算法输出子集$ s \ subset \ cup_i w_i $,以使$ s $的大小尽可能大。这个问题是在无数现实世界的应用中引起的。它在自然语言处理(NLP)应用中尤其无处不在,作为词汇提取。例如,从属于用户的私人文本数据中发现单词,句子,$ n $ grams等是设置联合问题的实例。 该问题的已知算法通过从每个用户收集一部分项目,将这些子集的结合并披露其嘈杂计数超过一定阈值的项目来进行。至关重要的是,在上述过程中,每个用户的贡献始终与其他用户所持有的项目无关,从而导致浪费的聚合过程,在这种过程中,某些项目计数恰好高于阈值。我们通过允许用户以$ \ textit {依赖的时尚} $贡献其项目,以偏离上面的范式,并在$ \ textit {policy} $的指导下。在这种新环境中,确保隐私非常微妙。我们证明,任何具有某些$ \ textit {Contractive} $属性的策略都会导致差异私有算法。我们设计了两种新的算法,一种使用拉普拉斯噪声和其他高斯噪声,作为满足承包特性的特定策略实例。我们的实验表明,新算法明显优于以前已知的问题的机制。
We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($ε$,$δ$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, $n$-grams etc., from private text data belonging to users is an instance of the set union problem. Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.