用于分布式非coNVEX优化的原始二重式SGD算法

论文标题

用于分布式非coNVEX优化的原始二重式SGD算法

A Primal-Dual SGD Algorithm for Distributed Nonconvex Optimization

论文作者

Yi, Xinlei, Zhang, Shengjun, Yang, Tao, Chai, Tianyou, Johansson, Karl H.

论文摘要

考虑使用本地信息交换，由$ n $本地成本函数总和最小化的分布式非convex优化问题是最大程度地减少全球成本函数。这个问题是许多机器学习技术与数据并行性的重要组成部分，例如深度学习和联合学习。我们提出了一个分布式的原始 - 双重随机梯度下降（SGD）算法，适用于任意连接的通信网络和任何平滑（可能是非convex）成本函数。 We show that the proposed algorithm achieves the linear speedup convergence rate $\mathcal{O}(1/\sqrt{nT})$ for general nonconvex cost functions and the linear speedup convergence rate $\mathcal{O}(1/(nT))$ when the global cost function satisfies the Polyak--Łojasiewicz (P--Ł) condition, where $ t $是迭代的总数。我们还表明，具有恒定参数的提出算法的输出线性收敛到全局最佳的邻域。我们通过数值实验与基线集中式SGD和最近提出的分布式SGD算法相比，我们的算法的效率证明了我们的算法效率。

The distributed nonconvex optimization problem of minimizing a global cost function formed by a sum of $n$ local cost functions by using local information exchange is considered. This problem is an important component of many machine learning techniques with data parallelism, such as deep learning and federated learning. We propose a distributed primal--dual stochastic gradient descent (SGD) algorithm, suitable for arbitrarily connected communication networks and any smooth (possibly nonconvex) cost functions. We show that the proposed algorithm achieves the linear speedup convergence rate $\mathcal{O}(1/\sqrt{nT})$ for general nonconvex cost functions and the linear speedup convergence rate $\mathcal{O}(1/(nT))$ when the global cost function satisfies the Polyak--Łojasiewicz (P--Ł) condition, where $T$ is the total number of iterations. We also show that the output of the proposed algorithm with constant parameters linearly converges to a neighborhood of a global optimum. We demonstrate through numerical experiments the efficiency of our algorithm in comparison with the baseline centralized SGD and recently proposed distributed SGD algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题