论文标题
通过机器学习估算G-Leakage
Estimating g-Leakage via Machine Learning
论文作者
论文摘要
本文考虑了在黑盒方案中估计系统信息泄漏的问题。假定该系统的内部设备是学习者未知的,或者无论如何都无法分析,唯一可用的信息是成对的输入输出数据样本,可能是通过向系统提交查询或由第三方提供的。先前的研究主要集中于计算频率以估计输入输出条件概率(称为频繁的方法),但是,当可能的输出的域很大时,此方法不准确。为了克服这一困难,最近使用机器学习(ML)模型研究了理想分类器的贝叶斯误差的估计,并且由于这些模型学习输入输出通信的能力,它已被证明是更准确的。但是,贝叶斯脆弱性仅适合描述一尝试攻击。 G-Vulnerability的泄漏量更为笼统,更灵活,它涵盖了具有不同目标和能力的几种不同类型的对手。在本文中,我们提出了一种新的方法,可以使用ML对G-Vulnerable进行黑盒估计。我们方法的一个特征是,它不需要估计条件概率,并且适用于大型ML算法。首先,我们正式显示所有数据分布的可学习性。然后,我们使用K-Nearest邻居和神经网络通过各种实验评估了性能。当可观察到的域很大时,我们的结果优于常见方法。
This paper considers the problem of estimating the information leakage of a system in the black-box scenario. It is assumed that the system's internals are unknown to the learner, or anyway too complicated to analyze, and the only available information are pairs of input-output data samples, possibly obtained by submitting queries to the system or provided by a third party. Previous research has mainly focused on counting the frequencies to estimate the input-output conditional probabilities (referred to as frequentist approach), however this method is not accurate when the domain of possible outputs is large. To overcome this difficulty, the estimation of the Bayes error of the ideal classifier was recently investigated using Machine Learning (ML) models and it has been shown to be more accurate thanks to the ability of those models to learn the input-output correspondence. However, the Bayes vulnerability is only suitable to describe one-try attacks. A more general and flexible measure of leakage is the g-vulnerability, which encompasses several different types of adversaries, with different goals and capabilities. In this paper, we propose a novel approach to perform black-box estimation of the g-vulnerability using ML. A feature of our approach is that it does not require to estimate the conditional probabilities, and that it is suitable for a large class of ML algorithms. First, we formally show the learnability for all data distributions. Then, we evaluate the performance via various experiments using k-Nearest Neighbors and Neural Networks. Our results outperform the frequentist approach when the observables domain is large.