递归差异降低，快速去中心化的非凸有限和有限优化

论文标题

递归差异降低，快速去中心化的非凸有限和有限优化

Fast decentralized non-convex finite-sum optimization with recursive variance reduction

论文作者

Xin, Ran, Khan, Usman A., Kar, Soummya

论文摘要

本文考虑了$ n的分散化最小化：= nm $ spooth非凸成本函数在有向网络的$ n $节点网络上平均分配。具体而言，我们描述了一种称为GT-Sarah的随机一阶梯度方法，该方法采用了Sarah型差异技术和梯度跟踪（GT）来解决问题的随机和分散性。我们表明，具有适当算法参数的GT-SARAH，找到了一个$ε$ -CACCURATE的一阶固定点$ o \ big（\ max \ big \ {n^{\ frac {1} {2}}，n（1-λ）^{ - 2}，n^{\ fra c {2} {3}} m^{\ frac {1} {3}}}（1-λ）^{ - 1} \ big \}lε^{ - 2} \ big）$梯度复杂性，其中$ {（1-λ）\ in（0,1]} $是网络权重矩阵和$ l $的频谱差距。$ l $是成本函数的平滑度参数。这种梯度复杂性优于现有的去中心化随机渐变方法的表现。特别是在Big-Data Incime。 O(N^{\frac{1}{2}}(1-λ)^{3})}$, this gradient complexity furthers reduces to ${O(N^{\frac{1}{2}}Lε^{-2})}$, independent of the network topology, and matches that of the centralized near-optimal variance-reduced methods.此外，在这个方案中，GT-SARAH达到了非诱使线性的速度，在此，每个节点的梯度计算总数降低了$ 1/n $，与集中的接近最佳算法相比，与我们的最佳构图相比，该级别的所有梯度计算都适用。当地Minibatch尺寸的选择平衡了GT-Sarah的梯度和通信复杂性之间的权衡。

This paper considers decentralized minimization of $N:=nm$ smooth non-convex cost functions equally divided over a directed network of $n$ nodes. Specifically, we describe a stochastic first-order gradient method, called GT-SARAH, that employs a SARAH-type variance reduction technique and gradient tracking (GT) to address the stochastic and decentralized nature of the problem. We show that GT-SARAH, with appropriate algorithmic parameters, finds an $ε$-accurate first-order stationary point with $O\big(\max\big\{N^{\frac{1}{2}},n(1-λ)^{-2},n^{\frac{2}{3}}m^{\frac{1}{3}}(1-λ)^{-1}\big\}Lε^{-2}\big)$ gradient complexity, where ${(1-λ)\in(0,1]}$ is the spectral gap of the network weight matrix and $L$ is the smoothness parameter of the cost functions. This gradient complexity outperforms that of the existing decentralized stochastic gradient methods. In particular, in a big-data regime such that ${n = O(N^{\frac{1}{2}}(1-λ)^{3})}$, this gradient complexity furthers reduces to ${O(N^{\frac{1}{2}}Lε^{-2})}$, independent of the network topology, and matches that of the centralized near-optimal variance-reduced methods. Moreover, in this regime GT-SARAH achieves a non-asymptotic linear speedup, in that, the total number of gradient computations at each node is reduced by a factor of $1/n$ compared to the centralized near-optimal algorithms that perform all gradient computations at a single node. To the best of our knowledge, GT-SARAH is the first algorithm that achieves this property. In addition, we show that appropriate choices of local minibatch size balance the trade-offs between the gradient and communication complexity of GT-SARAH. Over infinite time horizon, we establish that all nodes in GT-SARAH asymptotically achieve consensus and converge to a first-order stationary point in the almost sure and mean-squared sense.

下载PDF全文

下载文献需遵守相关版权规定

论文标题