论文标题
联合随机梯度Langevin动力学
Federated Stochastic Gradient Langevin Dynamics
论文作者
论文摘要
随机梯度MCMC方法,例如随机梯度Langevin Dynamics(SGLD),采用快速但嘈杂的梯度估计来实现大规模后采样。尽管我们可以轻松地将SGLD扩展到分布式设置,但是当应用于联合的非IID数据时,它遇到了两个问题。首先,这些估计的方差显着增加。其次,延迟通信会导致马尔可夫链从真实的后部差异,即使对于非常简单的模型也是如此。为了减轻这两个问题,我们提出了有利的梯度,这是一种简单的机制,结合了局部可能性近似以纠正梯度更新。值得注意的是,有利的梯度易于计算,并且由于我们仅计算一次近似值,因此它们会遇到可忽略的开销。我们将有利的梯度应用于分布式随机梯度Langevin Dynamics(DSGLD),并调用由此产生的方法联合随机梯度Langevin Dynamics(FSGLD)。我们证明我们的方法可以处理延迟的通信回合,在DSGLD失败的情况下会融合到目标后部。我们还表明,使用公制学习和神经网络实验,FSGLD优于非IID联合数据的DSGLD。
Stochastic gradient MCMC methods, such as stochastic gradient Langevin dynamics (SGLD), employ fast but noisy gradient estimates to enable large-scale posterior sampling. Although we can easily extend SGLD to distributed settings, it suffers from two issues when applied to federated non-IID data. First, the variance of these estimates increases significantly. Second, delaying communication causes the Markov chains to diverge from the true posterior even for very simple models. To alleviate both these problems, we propose conducive gradients, a simple mechanism that combines local likelihood approximations to correct gradient updates. Notably, conducive gradients are easy to compute, and since we only calculate the approximations once, they incur negligible overhead. We apply conducive gradients to distributed stochastic gradient Langevin dynamics (DSGLD) and call the resulting method federated stochastic gradient Langevin dynamics (FSGLD). We demonstrate that our approach can handle delayed communication rounds, converging to the target posterior in cases where DSGLD fails. We also show that FSGLD outperforms DSGLD for non-IID federated data with experiments on metric learning and neural networks.