论文标题
有关数据出版隐私保护技术的调查
Survey on Privacy-Preserving Techniques for Data Publishing
论文作者
论文摘要
收集,处理和共享微型数据的指数增长引起了人们对个人隐私的关注。结果,法律和法规已经出现,以控制组织对微数据的影响以及如何保护它。统计披露控制旨在通过去识别来减少机密信息披露的风险。通过保护隐私技术可以保证这种去识别。但是,取消识别的数据通常会导致信息丢失,可能会影响数据分析精度和模型预测性能。主要目标是保护个人的隐私,同时保持数据的解释性,即其有用性。统计披露控制是一个正在扩展的领域,需要探索,因为仍然没有解决方案可以保证最佳隐私和实用性。这项调查重点是去识别过程的所有步骤。我们介绍了用于微型AT识别的现有隐私保护技术,适用于几种披露类型的隐私措施以及信息丢失和预测性绩效指标。在这项调查中,我们讨论了隐私限制提出的主要挑战,描述了应对这些障碍的主要方法,审查保护隐私技术的分类法,对现有比较研究提供了理论分析并提出了多个开放问题。
The exponential growth of collected, processed, and shared microdata has given rise to concerns about individuals' privacy. As a result, laws and regulations have emerged to control what organisations do with microdata and how they protect it. Statistical Disclosure Control seeks to reduce the risk of confidential information disclosure by de-identifying them. Such de-identification is guaranteed through privacy-preserving techniques. However, de-identified data usually results in loss of information, with a possible impact on data analysis precision and model predictive performance. The main goal is to protect the individuals' privacy while maintaining the interpretability of the data, i.e. its usefulness. Statistical Disclosure Control is an area that is expanding and needs to be explored since there is still no solution that guarantees optimal privacy and utility. This survey focuses on all steps of the de-identification process. We present existing privacy-preserving techniques used in microdata de-identification, privacy measures suitable for several disclosure types and, information loss and predictive performance measures. In this survey, we discuss the main challenges raised by privacy constraints, describe the main approaches to handle these obstacles, review taxonomies of privacy-preserving techniques, provide a theoretical analysis of existing comparative studies, and raise multiple open issues.