论文标题
通过Privgan体系结构在分散用户之间共享私人数据共享
Private data sharing between decentralized users through the privGAN architecture
论文作者
论文摘要
更多数据几乎总是有益于分析和机器学习任务。但是,在许多现实情况下,企业不能共享其数据,以保持竞争优势或保护数据源的隐私,例如企业的客户。我们为数据所有者提供了一种共享其数据的合成版本或虚假版本的方法,而无需共享实际数据,也没有直接访问数据的模型参数。提出的方法基于Privgan架构,在该体系结构中,对各自的数据子集进行了本地g的培训,并从中央歧视者的额外惩罚中进行了额外的惩罚,旨在歧视给定的假样本的起源。我们证明,当应用于各种尺寸的子集时,与从其真正的小型数据集中的实用程序相比,对所有者的实用性更好。唯一共享的信息是中央歧视器的参数更新。对架构最脆弱的Elments的白色框攻击证明了隐私,结果接近随机猜测。此方法将自然适用于联合学习环境。
More data is almost always beneficial for analysis and machine learning tasks. In many realistic situations however, an enterprise cannot share its data, either to keep a competitive advantage or to protect the privacy of the data sources, the enterprise's clients for example. We propose a method for data owners to share synthetic or fake versions of their data without sharing the actual data, nor the parameters of models that have direct access to the data. The method proposed is based on the privGAN architecture where local GANs are trained on their respective data subsets with an extra penalty from a central discriminator aiming to discriminate the origin of a given fake sample. We demonstrate that this approach, when applied to subsets of various sizes, leads to better utility for the owners than the utility from their real small datasets. The only shared pieces of information are the parameter updates of the central discriminator. The privacy is demonstrated with white-box attacks on the most vulnerable elments of the architecture and the results are close to random guessing. This method would apply naturally in a federated learning setting.