一项关于针对表格数据合成模型的成员推理攻击的实证研究

论文标题

一项关于针对表格数据合成模型的成员推理攻击的实证研究

An Empirical Study on the Membership Inference Attack against Tabular Data Synthesis Models

论文作者

Hyeong, Jihyeon, Kim, Jayoung, Park, Noseong, Jajodia, Sushil

论文摘要

表格数据通常包含私人和重要信息；因此，必须在与他人共享之前采取预防措施。尽管已经提出了几种方法（例如，差异隐私和K-匿名性）以防止信息泄漏，但近年来，表格数据合成模型已经流行，因为它们可以在数据实用程序和隐私之间进行易于折衷。但是，最近的研究表明，图像数据的生成模型容易受到会员推理攻击的影响，这可以确定是否使用给定记录来训练受害者合成模型。在本文中，我们在表格数据合成的背景下研究了成员推理攻击。我们在两个攻击方案（即一个黑框和一个白盒攻击）下对4个最先进的表格数据合成模型进行实验，并发现会员推理攻击会严重危及这些模型。我们接下来进行实验，以评估两种流行的差异性深度学习训练算法DP-SGD和DP-GAN如何能够保护模型免受攻击。我们的主要发现是，两种算法都可以通过牺牲发电质量来减轻这种威胁。

Tabular data typically contains private and important information; thus, precautions must be taken before they are shared with others. Although several methods (e.g., differential privacy and k-anonymity) have been proposed to prevent information leakage, in recent years, tabular data synthesis models have become popular because they can well trade-off between data utility and privacy. However, recent research has shown that generative models for image data are susceptible to the membership inference attack, which can determine whether a given record was used to train a victim synthesis model. In this paper, we investigate the membership inference attack in the context of tabular data synthesis. We conduct experiments on 4 state-of-the-art tabular data synthesis models under two attack scenarios (i.e., one black-box and one white-box attack), and find that the membership inference attack can seriously jeopardize these models. We next conduct experiments to evaluate how well two popular differentially-private deep learning training algorithms, DP-SGD and DP-GAN, can protect the models against the attack. Our key finding is that both algorithms can largely alleviate this threat by sacrificing the generation quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题