避免在逻辑回归中进行交流

论文标题

避免在逻辑回归中进行交流

Avoiding Communication in Logistic Regression

论文作者

Devarakonda, Aditya, Demmel, James

论文摘要

随机梯度下降（SGD）是解决各种机器学习问题的最广泛使用的优化方法之一。 SGD通过从输入数据中迭代采样一些数据点，计算所选数据点的梯度并更新解决方案来解决优化问题。但是，在平行设置中，SGD需要在每次迭代时进行反学通信。我们引入了一种新的避开通信技术，用于使用SGD解决逻辑回归问题。该技术将SGD计算重新组织为一种传达每一个$ s $迭代的表单，而不是每次迭代，其中$ s $是调整参数。我们证明了SGD的理论拖放，带宽和潜伏期上限及其新的避免通信的变体。此外，我们显示了实验结果，这些结果说明了新的避免通信的SGD（CA-SGD）方法可以在高性能的Infiniband群集上实现高达$ 4.97 \ times $的加速，而不会改变收敛行为或准确性。

Stochastic gradient descent (SGD) is one of the most widely used optimization methods for solving various machine learning problems. SGD solves an optimization problem by iteratively sampling a few data points from the input data, computing gradients for the selected data points, and updating the solution. However, in a parallel setting, SGD requires interprocess communication at every iteration. We introduce a new communication-avoiding technique for solving the logistic regression problem using SGD. This technique re-organizes the SGD computations into a form that communicates every $s$ iterations instead of every iteration, where $s$ is a tuning parameter. We prove theoretical flops, bandwidth, and latency upper bounds for SGD and its new communication-avoiding variant. Furthermore, we show experimental results that illustrate that the new Communication-Avoiding SGD (CA-SGD) method can achieve speedups of up to $4.97\times$ on a high-performance Infiniband cluster without altering the convergence behavior or accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题