论文标题

避免在逻辑回归中进行交流

Avoiding Communication in Logistic Regression

论文作者

Devarakonda, Aditya, Demmel, James

论文摘要

随机梯度下降(SGD)是解决各种机器学习问题的最广泛使用的优化方法之一。 SGD通过从输入数据中迭代采样一些数据点,计算所选数据点的梯度并更新解决方案来解决优化问题。但是,在平行设置中,SGD需要在每次迭代时进行反学通信。我们引入了一种新的避开通信技术,用于使用SGD解决逻辑回归问题。该技术将SGD计算重新组织为一种传达每一个$ s $迭代的表单,而不是每次迭代,其中$ s $是调整参数。我们证明了SGD的理论拖放,带宽和潜伏期上限及其新的避免通信的变体。此外,我们显示了实验结果,这些结果说明了新的避免通信的SGD(CA-SGD)方法可以在高性能的Infiniband群集上实现高达$ 4.97 \ times $的加速,而不会改变收敛行为或准确性。

Stochastic gradient descent (SGD) is one of the most widely used optimization methods for solving various machine learning problems. SGD solves an optimization problem by iteratively sampling a few data points from the input data, computing gradients for the selected data points, and updating the solution. However, in a parallel setting, SGD requires interprocess communication at every iteration. We introduce a new communication-avoiding technique for solving the logistic regression problem using SGD. This technique re-organizes the SGD computations into a form that communicates every $s$ iterations instead of every iteration, where $s$ is a tuning parameter. We prove theoretical flops, bandwidth, and latency upper bounds for SGD and its new communication-avoiding variant. Furthermore, we show experimental results that illustrate that the new Communication-Avoiding SGD (CA-SGD) method can achieve speedups of up to $4.97\times$ on a high-performance Infiniband cluster without altering the convergence behavior or accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源