论文标题
差异私有简单的线性回归
Differentially Private Simple Linear Regression
论文作者
论文摘要
经济学和社会科学研究通常需要以精细粒度分析敏感个人信息的数据集,并且模型适合数据的小亚集。不幸的是,这种细粒度的分析可以轻松揭示敏感的个人信息。我们研究了满足差异隐私的简单线性回归的算法,该算法确保算法的输出几乎没有任何单个输入数据记录,即使是对具有有关数据集的任意侧面信息的攻击者也几乎没有任何信息。我们考虑针对小数据集的简单线性回归的差异私有算法的设计,具有数十个到数百个数据点,这是差异隐私的特别具有挑战性的制度。为了关注经济学研究中小区域分析的特定应用,我们研究了我们适应环境的算法频谱的性能。我们确定了影响其性能的关键因素,通过一系列实验表明,基于强大的估计器(尤其是Theil-Sen估计器)在最小的数据集上表现良好,但是其他更标准的算法会随着数据集尺寸的增加而表现出色。
Economics and social science research often require analyzing datasets of sensitive personal information at fine granularity, with models fit to small subsets of the data. Unfortunately, such fine-grained analysis can easily reveal sensitive individual information. We study algorithms for simple linear regression that satisfy differential privacy, a constraint which guarantees that an algorithm's output reveals little about any individual input data record, even to an attacker with arbitrary side information about the dataset. We consider the design of differentially private algorithms for simple linear regression for small datasets, with tens to hundreds of datapoints, which is a particularly challenging regime for differential privacy. Focusing on a particular application to small-area analysis in economics research, we study the performance of a spectrum of algorithms we adapt to the setting. We identify key factors that affect their performance, showing through a range of experiments that algorithms based on robust estimators (in particular, the Theil-Sen estimator) perform well on the smallest datasets, but that other more standard algorithms do better as the dataset size increases.