论文标题
与隐私感知代理商的最佳数据获取
Optimal Data Acquisition with Privacy-Aware Agents
论文作者
论文摘要
我们研究了希望从隐私感知者那里收集私人数据的数据分析师或平台所面临的问题。为了激励参与,为了换取此数据,该平台以使用所有代理商提交的数据计算的统计量的形式为代理提供服务。代理商决定是否加入平台(并如实透露他们的数据),通过考虑加入的隐私成本以及从获得统计数据中获得的收益来参与。该平台必须确保对统计数据进行差异化计算,并选择中心噪声以添加到计算中,但也可以通过给予不同的代理在计算中给予个性化的隐私级别(或成本),这是其异质隐私偏好(平台已知)的函数。我们假设该平台旨在优化统计数据的准确性,并且必须选择每个代理商的隐私水平以在i)激励更多参与和ii)估算中增加噪音更少的噪声。 我们提供了模型两个变体中平台的最佳代理权重选择的半锁定形式表征。在这两个模型中,我们都在平台的最佳解决方案中确定了一个常见的非平凡结构:特定于实例的隐私要求要求的特定数量的代理被汇总在一起并给予相同的权重,而其余代理的权重则随着其隐私要求的强度而减小。我们还提供了有关如何找到平台使用的噪声参数和给定给代理的权重的最佳值的算法结果。
We study the problem faced by a data analyst or platform that wishes to collect private data from privacy-aware agents. To incentivize participation, in exchange for this data, the platform provides a service to the agents in the form of a statistic computed using all agents' submitted data. The agents decide whether to join the platform (and truthfully reveal their data) or not participate by considering both the privacy costs of joining and the benefit they get from obtaining the statistic. The platform must ensure the statistic is computed differentially privately and chooses a central level of noise to add to the computation, but can also induce personalized privacy levels (or costs) by giving different weights to different agents in the computation as a function of their heterogeneous privacy preferences (which are known to the platform). We assume the platform aims to optimize the accuracy of the statistic, and must pick the privacy level of each agent to trade-off between i) incentivizing more participation and ii) adding less noise to the estimate. We provide a semi-closed form characterization of the optimal choice of agent weights for the platform in two variants of our model. In both of these models, we identify a common nontrivial structure in the platform's optimal solution: an instance-specific number of agents with the least stringent privacy requirements are pooled together and given the same weight, while the weights of the remaining agents decrease as a function of the strength of their privacy requirement. We also provide algorithmic results on how to find the optimal value of the noise parameter used by the platform and of the weights given to the agents.