论文标题
用低级和稀疏梯度扩展私人深度学习
Scaling Private Deep Learning with Low-Rank and Sparse Gradients
论文作者
论文摘要
将差异化随机梯度下降(DPSGD)应用于训练现代的大规模神经网络(例如基于变压器的模型)是一项艰巨的任务,因为在每个迭代尺度上添加了噪声的幅度,并具有模型维度,从而显着阻碍了学习能力。我们提出了一个统一的框架,即$ \ textsf {lsg} $,该框架完全利用了神经网络的低级别和稀疏结构,以减少梯度更新的维度,从而减轻DPSGD的负面影响。首先使用一对低级矩阵近似梯度更新。然后,一种新颖的策略被用来稀疏梯度,从而导致低维,嘈杂的更新,尚未能够保留神经网络的性能。关于自然语言处理和计算机视觉任务的经验评估表明,我们的方法的表现优于其他最先进的基准。
Applying Differentially Private Stochastic Gradient Descent (DPSGD) to training modern, large-scale neural networks such as transformer-based models is a challenging task, as the magnitude of noise added to the gradients at each iteration scales with model dimension, hindering the learning capability significantly. We propose a unified framework, $\textsf{LSG}$, that fully exploits the low-rank and sparse structure of neural networks to reduce the dimension of gradient updates, and hence alleviate the negative impacts of DPSGD. The gradient updates are first approximated with a pair of low-rank matrices. Then, a novel strategy is utilized to sparsify the gradients, resulting in low-dimensional, less noisy updates that are yet capable of retaining the performance of neural networks. Empirical evaluation on natural language processing and computer vision tasks shows that our method outperforms other state-of-the-art baselines.