论文标题
由公共数据辅助的私人查询发布
Private Query Release Assisted by Public Data
论文作者
论文摘要
我们研究了通过访问公共数据的辅助私有查询发布的问题。在这个问题中,目的是通过使用公共和私人样本组合的统计查询回答大型类$ \ MATHCAL {H} $,其错误不超过$α$。该算法仅针对私人样本来满足差异隐私。我们根据私人和公共样本复杂性研究了这项任务的局限性。 首先,我们证明我们可以仅使用$ d/α$公共样本和$ \ sqrt {p} d^{3/2}/α^2 $ private样品,其中$ d $ d $ d $ d $ d $ d $ dimensive和$ p $ c $ p $ c $ p $ n $ $ p $ n $ $ p $ n $ $ p $ n $ $ p $ n $ 分别。相比之下,只有私人样本,即使对于具有VC-Dimension One的简单查询类也无法解决此问题,并且如果没有任何私人样本,则需要更大的公共样本$ d/α^2 $。接下来,我们给出样本复杂性下限,对$ p $和$α$表现出密切的依赖。对于决策类别的类别,我们在私人样本复杂性上给出$ \ sqrt {p}/α$的下限,只要公共样本小于$ 1/α^2 $。鉴于我们的上限,这表明对$ \ sqrt {p} $的依赖性在私人样本复杂性中是必需的。我们还为广泛的查询类别的公共样本复杂性提供了$ 1/α$的$ 1/α$,这是我们的上限,在$α$中很紧。
We study the problem of differentially private query release assisted by access to public data. In this problem, the goal is to answer a large class $\mathcal{H}$ of statistical queries with error no more than $α$ using a combination of public and private samples. The algorithm is required to satisfy differential privacy only with respect to the private samples. We study the limits of this task in terms of the private and public sample complexities. First, we show that we can solve the problem for any query class $\mathcal{H}$ of finite VC-dimension using only $d/α$ public samples and $\sqrt{p}d^{3/2}/α^2$ private samples, where $d$ and $p$ are the VC-dimension and dual VC-dimension of $\mathcal{H}$, respectively. In comparison, with only private samples, this problem cannot be solved even for simple query classes with VC-dimension one, and without any private samples, a larger public sample of size $d/α^2$ is needed. Next, we give sample complexity lower bounds that exhibit tight dependence on $p$ and $α$. For the class of decision stumps, we give a lower bound of $\sqrt{p}/α$ on the private sample complexity whenever the public sample size is less than $1/α^2$. Given our upper bounds, this shows that the dependence on $\sqrt{p}$ is necessary in the private sample complexity. We also give a lower bound of $1/α$ on the public sample complexity for a broad family of query classes, which by our upper bound, is tight in $α$.