论文标题
容易差异私有线性回归
Easy Differentially Private Linear Regression
论文作者
论文摘要
线性回归是统计分析的基本工具。这激发了线性回归方法的开发,这些方法也满足了差异隐私,因此可以保证,学到的模型几乎没有揭示用于构建它的任何一个数据点。但是,现有的不同私有解决方案假设最终用户可以轻松指定良好的数据范围和超参数。两者都有重大的实际障碍。在本文中,我们研究了一种算法,该算法使用指数机制从非私有回归模型集合中选择具有高图基深度的模型。给定用于训练$ m $型号的$ d $二维数据的$ n $样品,我们使用近似Tukey深度构建一个有效的模拟,该深度在时间$ o(d^2n + dm \ log(m))$中构建。我们发现,该算法在数据范围内不需要数据范围或不需要的超参数选择中获得了强大的经验性能。
Linear regression is a fundamental tool for statistical analysis. This has motivated the development of linear regression methods that also satisfy differential privacy and thus guarantee that the learned model reveals little about any one data point used to construct it. However, existing differentially private solutions assume that the end user can easily specify good data bounds and hyperparameters. Both present significant practical obstacles. In this paper, we study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models. Given $n$ samples of $d$-dimensional data used to train $m$ models, we construct an efficient analogue using an approximate Tukey depth that runs in time $O(d^2n + dm\log(m))$. We find that this algorithm obtains strong empirical performance in the data-rich setting with no data bounds or hyperparameter selection required.