论文标题

在私人随机凸的差异上优化和重尾数据

On Differentially Private Stochastic Convex Optimization with Heavy-tailed Data

论文作者

Wang, Di, Xiao, Hanshen, Devadas, Srini, Xu, Jinhui

论文摘要

在本文中,我们考虑了在重尾数据上设计差异化私有(DP)算法(SCO)的问题。此类数据的不规则性违反了几乎所有现有的DP-SCO和DP-MER方法中使用的一些关键假设,从而导致未能提供DP保证。为了更好地了解这种类型的挑战,我们在本文中提供了对DP-SCO在各种情况下进行的全面研究。首先,我们考虑损失函数强烈凸出和光滑的情况。在这种情况下,我们提出了一种基于样本和聚集框架的方法,该方法的人口过剩风险为$ \ tilde {o}(\ frac {d^3} {d^3} {nε^4})$(省略其他因素),其中$ n $是样本大小,$ d $是数据的尺寸。然后,我们表明,有了一些关于损失功能的其他假设,可以将\ textit {预期}过量的总体风险降低至$ \ tilde {o}(\ frac {d^2} {nε^2})$。为了提高这些额外的条件,我们还提供了基于梯度平滑和基于修剪的方案,以实现$ \ tilde {o}(\ frac {d^2} {d^2} {nε^2})$和$ \ tilde {o}(\ frac {d^\ frac {2} {3}} {(nε^2)^\ frac {1} {1} {3}})$,分别为convex和一般convex损失功能,分别具有很高的概率}。实验表明,我们的算法可以有效地应对数据不规则性引起的挑战。

In this paper, we consider the problem of designing Differentially Private (DP) algorithms for Stochastic Convex Optimization (SCO) on heavy-tailed data. The irregularity of such data violates some key assumptions used in almost all existing DP-SCO and DP-ERM methods, resulting in failure to provide the DP guarantees. To better understand this type of challenges, we provide in this paper a comprehensive study of DP-SCO under various settings. First, we consider the case where the loss function is strongly convex and smooth. For this case, we propose a method based on the sample-and-aggregate framework, which has an excess population risk of $\tilde{O}(\frac{d^3}{nε^4})$ (after omitting other factors), where $n$ is the sample size and $d$ is the dimensionality of the data. Then, we show that with some additional assumptions on the loss functions, it is possible to reduce the \textit{expected} excess population risk to $\tilde{O}(\frac{ d^2}{ nε^2 })$. To lift these additional conditions, we also provide a gradient smoothing and trimming based scheme to achieve excess population risks of $\tilde{O}(\frac{ d^2}{nε^2})$ and $\tilde{O}(\frac{d^\frac{2}{3}}{(nε^2)^\frac{1}{3}})$ for strongly convex and general convex loss functions, respectively, \textit{with high probability}. Experiments suggest that our algorithms can effectively deal with the challenges caused by data irregularity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源