论文标题
具有隐私数据综合的合理可否认性
Plausible deniability for privacy-preserving data synthesis
论文作者
论文摘要
在隐私保护领域,发布完整的数据(尤其是高维数据集)是最具挑战性的问题之一。通用的加密技术无法与攻击者打交道以进行差异攻击以获取敏感信息,而现有的差异隐私保护算法模型需要很长时间才能进行高维度计算,并且需要添加噪声以降低数据准确性,这不适合高维大数据集。鉴于这种情况,本文设计了一个完整的数据综合方案,以保护“合理拒绝”概念的数据隐私。首先,本文为“合理数据”和“合理数据”之间的差异提供了理论支持。在方案设计的过程中,本文将方案设计分解为构建数据合成模块和隐私测试模块,然后分别为它们设计算法模型,并实现了隐私保护的功能。在评估该方案的可行性时,本文将2013年社区人口普查的结果选择为高维数据集,使用基于Python的模拟程序来测试和分析数据综合方案的效率和可靠性。这一部分着重于评估该计划的隐私保护效果。
In the field of privacy protection, publishing complete data (especially high-dimensional data sets) is one of the most challenging problems. The common encryption technology can not deal with the attacker to take differential attack to obtain sensitive information, while the existing differential privacy protection algorithm model takes a long time for high-dimensional calculation and needs to add noise to reduce data accuracy, which is not suitable for high-dimensional large data sets. In view of this situation, this paper designs a complete data synthesis scheme to protect data privacy around the concept of "plausible denial". Firstly, the paper provides the theoretical support for the difference between "plausible data" and "plausible data". In the process of scheme designing, this paper decomposes the scheme design into construction data synthesis module and privacy test module, then designs algorithm models for them respectively and realizes the function of privacy protection. When evaluating the feasibility of the scheme, the paper selects the Results of the 2013 community census in the United States as the high-dimensional data set, uses the simulation program that is based on Python to test and analyzes the efficiency and reliability of the data synthesis scheme. This portion focuses on the evaluation of the privacy protection effectiveness of the scheme.