论文标题
使用本地差异私人数据的分布式机器学习
Distributionally-Robust Machine Learning Using Locally Differentially-Private Data
论文作者
论文摘要
我们使用本地差异性私有数据集考虑机器学习,尤其是回归。 Wasserstein距离用于定义以当地差异隐私噪声破坏数据集的经验分布的歧义集。模棱两可集显示出包含不受干扰的干净数据的概率分布。歧义集的半径是隐私预算,数据传播以及问题大小的函数。因此,可以将机器学习与本地界定的私有数据集一起被重写为分配稳定的优化。对于一般分布,分布式优化问题可以作为正规机器学习问题而放松,而机器学习模型的Lipschitz作为正规器。对于线性和逻辑回归,此正常化程序是模型参数的双重规范。对于高斯数据,可以精确地解决分布射击优化问题以找到最佳的正常化程序。这种方法为训练线性回归模型提供了全新的正规化程序。可以将这个新颖的正规化程序培训作为半准计划。最后,在实际数据集中证明了拟议的分配机器机器学习培训的性能。
We consider machine learning, particularly regression, using locally-differentially private datasets. The Wasserstein distance is used to define an ambiguity set centered at the empirical distribution of the dataset corrupted by local differential privacy noise. The ambiguity set is shown to contain the probability distribution of unperturbed, clean data. The radius of the ambiguity set is a function of the privacy budget, spread of the data, and the size of the problem. Hence, machine learning with locally-differentially private datasets can be rewritten as a distributionally-robust optimization. For general distributions, the distributionally-robust optimization problem can relaxed as a regularized machine learning problem with the Lipschitz constant of the machine learning model as a regularizer. For linear and logistic regression, this regularizer is the dual norm of the model parameters. For Gaussian data, the distributionally-robust optimization problem can be solved exactly to find an optimal regularizer. This approach results in an entirely new regularizer for training linear regression models. Training with this novel regularizer can be posed as a semi-definite program. Finally, the performance of the proposed distributionally-robust machine learning training is demonstrated on practical datasets.