通过快速的每例梯度剪辑来扩大差异私人深度学习

论文标题

通过快速的每例梯度剪辑来扩大差异私人深度学习

Scaling up Differentially Private Deep Learning with Fast Per-Example Gradient Clipping

论文作者

Lee, Jaewoo, Kifer, Daniel

论文摘要

关于Renyi差异隐私的最新工作表明，将差异隐私应用于深度学习任务的可行性。然而，尽管有希望，但私人深层网络通常远远落后于其准确性的非私有化网络，表明需要在模型体系结构，优化器等方面进行更多的研究。这项扩展的研究的障碍之一是培训时间 - 通常比培训非私人网络大的数量级。这种放缓的原因是与隐私相关的至关重要的步骤，称为“每个例子梯度剪辑”，其天真的实施使使用GPU的批处理培训的好处。通过分析后传播方程，我们得出了与自动差异兼容的每个梯度剪辑的新方法（例如，在Pytorch和Tensorflow中），并提供了更好的GPU利用率。我们在Pytorch中的实施显示出明显的训练加速度（通过54倍-94倍的因素，用于培训的各种模型，尺寸为128）。这些技术适用于各种架构选择，包括卷积层，经常性网络，注意力，残留块等。

Recent work on Renyi Differential Privacy has shown the feasibility of applying differential privacy to deep learning tasks. Despite their promise, however, differentially private deep networks often lag far behind their non-private counterparts in accuracy, showing the need for more research in model architectures, optimizers, etc. One of the barriers to this expanded research is the training time -- often orders of magnitude larger than training non-private networks. The reason for this slowdown is a crucial privacy-related step called "per-example gradient clipping" whose naive implementation undoes the benefits of batch training with GPUs. By analyzing the back-propagation equations we derive new methods for per-example gradient clipping that are compatible with auto-differentiation (e.g., in PyTorch and TensorFlow) and provide better GPU utilization. Our implementation in PyTorch showed significant training speed-ups (by factors of 54x - 94x for training various models with batch sizes of 128). These techniques work for a variety of architectural choices including convolutional layers, recurrent networks, attention, residual blocks, etc.

下载PDF全文

下载文献需遵守相关版权规定

论文标题