系统评估机器学习模型的隐私风险

论文标题

系统评估机器学习模型的隐私风险

Systematic Evaluation of Privacy Risks of Machine Learning Models

论文作者

Song, Liwei, Mittal, Prateek

论文摘要

机器学习模型容易记住敏感数据，使它们容易受到会员推理攻击的攻击，其中对手的目标是猜测是否使用输入样本来训练模型。在本文中，我们表明，关于会员推理攻击的先前工作可能会严重低估隐私风险，仅依靠培训自定义的神经网络分类器来执行攻击，而仅专注于诸如攻击精度之类的数据样本上的汇总结果。为了克服这些局限性，我们首先建议通过改善现有的基于非神经网络的推理攻击并根据预测熵修改基于新的推理攻击方法来基于成员推理隐私风险。我们还通过考虑具有国防知识的自适应对手，并考虑模型准确性和隐私风险之间的权衡，为防御机制提出基准。使用我们的基准攻击，我们证明了现有的防御方法不如先前报道的那么有效。接下来，我们通过制定和得出一个称为隐私风险评分的新指标来引入一种新方法进行细粒度隐私分析。我们的隐私风险评分度量标准衡量单个样本成为培训成员的可能性，这使对手能够识别具有高隐私风险的样本并以高信心进行攻击。我们在实验中验证了隐私风险评分的有效性，并证明了个人样本中隐私风险评分的分布是异质的。最后，我们进行了深入的研究，以了解某些样本为什么具有很高的隐私风险，包括与模型灵敏度，泛化错误和特征嵌入的相关性。我们的工作强调了对机器学习模型隐私风险进行系统和严格评估的重要性。

Machine learning models are prone to memorizing sensitive data, making them vulnerable to membership inference attacks in which an adversary aims to guess if an input sample was used to train the model. In this paper, we show that prior work on membership inference attacks may severely underestimate the privacy risks by relying solely on training custom neural network classifiers to perform attacks and focusing only on the aggregate results over data samples, such as the attack accuracy. To overcome these limitations, we first propose to benchmark membership inference privacy risks by improving existing non-neural network based inference attacks and proposing a new inference attack method based on a modification of prediction entropy. We also propose benchmarks for defense mechanisms by accounting for adaptive adversaries with knowledge of the defense and also accounting for the trade-off between model accuracy and privacy risks. Using our benchmark attacks, we demonstrate that existing defense approaches are not as effective as previously reported. Next, we introduce a new approach for fine-grained privacy analysis by formulating and deriving a new metric called the privacy risk score. Our privacy risk score metric measures an individual sample's likelihood of being a training member, which allows an adversary to identify samples with high privacy risks and perform attacks with high confidence. We experimentally validate the effectiveness of the privacy risk score and demonstrate that the distribution of privacy risk score across individual samples is heterogeneous. Finally, we perform an in-depth investigation for understanding why certain samples have high privacy risks, including correlations with model sensitivity, generalization error, and feature embeddings. Our work emphasizes the importance of a systematic and rigorous evaluation of privacy risks of machine learning models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题