论文标题

kamino:约束意识到差异性私人数据综合

Kamino: Constraint-Aware Differentially Private Data Synthesis

论文作者

Ge, Chang, Mohapatra, Shubhankar, He, Xi, Ilyas, Ihab F.

论文摘要

组织越来越依靠数据来支持决策。当数据包含私有和敏感信息时,数据所有者通常希望发布与真实数据相似的合成数据库实例,同时确保单个数据记录的隐私。现有的私有数据综合方法旨在基于应用程序生成有用的数据,但它们无法保留结构化数据的最基本数据属性之一 - 元素和属性之间的基本相关性和依赖关系(即数据的结构)。这种结构通常表示为完整性和模式约束,或者以概率生成过程表示。结果,综合数据对于需要保留此结构的任何下游任务都不有用。 这项工作提出了Kamino,这是一个数据综合系统,可确保差异隐私并保留原始数据集中存在的结构和相关性。 Kamino将其作为数据库实例的输入及其模式(包括完整性约束),并产生具有不同隐私和结构保存保证的合成数据库实例。我们从经验上表明,在保留数据结构的同时,Kamino在训练分类模型的应用中实现了可比性,甚至更好的用途,并且回答边际查询要比差分私有数据合成的最先进方法。

Organizations are increasingly relying on data to support decisions. When data contains private and sensitive information, the data owner often desires to publish a synthetic database instance that is similarly useful as the true data, while ensuring the privacy of individual data records. Existing differentially private data synthesis methods aim to generate useful data based on applications, but they fail in keeping one of the most fundamental data properties of the structured data -- the underlying correlations and dependencies among tuples and attributes (i.e., the structure of the data). This structure is often expressed as integrity and schema constraints, or with a probabilistic generative process. As a result, the synthesized data is not useful for any downstream tasks that require this structure to be preserved. This work presents Kamino, a data synthesis system to ensure differential privacy and to preserve the structure and correlations present in the original dataset. Kamino takes as input of a database instance, along with its schema (including integrity constraints), and produces a synthetic database instance with differential privacy and structure preservation guarantees. We empirically show that while preserving the structure of the data, Kamino achieves comparable and even better usefulness in applications of training classification models and answering marginal queries than the state-of-the-art methods of differentially private data synthesis.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源