了解私人SGD中的梯度剪辑：几何视角

论文标题

了解私人SGD中的梯度剪辑：几何视角

Understanding Gradient Clipping in Private SGD: A Geometric Perspective

论文作者

Chen, Xiangyi, Wu, Zhiwei Steven, Hong, Mingyi

论文摘要

深度学习模型在许多机器学习应用程序中越来越受欢迎，其中培训数据可能包含敏感信息。为了提供正式和严格的隐私保证，许多学习系统现在通过（差异）私人SGD培训其模型来纳入差异隐私。每个私有SGD更新的关键步骤是梯度剪辑，每当其L2 Norm超过一定的阈值时，都会缩小单个示例的梯度。我们首先证明梯度剪辑如何防止SGD融合到固定点。然后，我们提供了理论分析，该分析完全量化了收敛性的剪接偏差，并在梯度分布和几何对称分布之间进行了差异度量。我们的经验评估进一步表明，沿私人SGD轨迹的梯度分布确实表现出有利于收敛的对称结构。总之，我们的结果提供了一个解释，为什么私人SGD具有梯度剪裁的私人SGD在实践中仍然有效，尽管其潜在的剪裁偏见。最后，我们开发了一种新的基于扰动的技术，即使对于具有高度不对称梯度分布的实例，也可以纠正剪接偏差。

Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information. To provide formal and rigorous privacy guarantee, many learning systems now incorporate differential privacy by training their models with (differentially) private SGD. A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold. We first demonstrate how gradient clipping can prevent SGD from converging to stationary point. We then provide a theoretical analysis that fully quantifies the clipping bias on convergence with a disparity measure between the gradient distribution and a geometrically symmetric distribution. Our empirical evaluation further suggests that the gradient distributions along the trajectory of private SGD indeed exhibit symmetric structure that favors convergence. Together, our results provide an explanation why private SGD with gradient clipping remains effective in practice despite its potential clipping bias. Finally, we develop a new perturbation-based technique that can provably correct the clipping bias even for instances with highly asymmetric gradient distributions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题