论文标题

差异降低了额外和挖掘及其最佳加速度,以强烈凸出分散优化

Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization

论文作者

Li, Huan, Lin, Zhouchen, Fang, Yongchun

论文摘要

我们研究了具有大规模分布数据的机器学习模型问题的随机分散优化。我们扩展了具有降低方差(VR)的广泛使用的额外和挖掘方法,并提出了两种方法:VR-Extra和VR挖掘。提出的VR-Extra需要$ o((((κ_s+n)\ log \ frac {1}ε)$随机梯度评估和$ o(((((((κ_b+κ_c)\ frac \ log \ log \ frac {1}ε)$ nonverse complaces non-complecise $ gractiest non-complecties complectiations conterifite converations converations converation $ε$中的位置。 $κ_s$和$κ_B$是强烈凸和平滑问题的随机条件编号和批处理条件编号,$κ_C$是通信网络的条件数,而$ n $是每个分布式节点上的样本大小。所提出的VR挖掘的通信成本更高,为$ O(((κ_b+κ_c^2)\ log \ frac {1}ε)$。我们的随机梯度计算复杂性与单基因VR方法(例如SAG,SAGA和SVRG)相同,我们的通信复杂性分别与额外的和挖掘的复杂性保持相同。为了进一步加快融合的速度,我们还提出了加速的VR-Extra和VR-Digig,这两种最佳$ o((((\ sqrt {nκ_s}+n)\ log \ log \ frac {1} 1}ε)$随机梯度计算复杂性和$ O(沟通复杂性。我们的随机梯度计算复杂性也与单基加速的VR方法(例如Katyusha)相同,我们的通信复杂性与加速的全批批次分散方法(例如MSDA)相同。

We study stochastic decentralized optimization for the problem of training machine learning models with large-scale distributed data. We extend the widely used EXTRA and DIGing methods with variance reduction (VR), and propose two methods: VR-EXTRA and VR-DIGing. The proposed VR-EXTRA requires the time of $O((κ_s+n)\log\frac{1}ε)$ stochastic gradient evaluations and $O((κ_b+κ_c)\log\frac{1}ε)$ communication rounds to reach precision $ε$, which are the best complexities among the non-accelerated gradient-type methods, where $κ_s$ and $κ_b$ are the stochastic condition number and batch condition number for strongly convex and smooth problems, respectively, $κ_c$ is the condition number of the communication network, and $n$ is the sample size on each distributed node. The proposed VR-DIGing has a little higher communication cost of $O((κ_b+κ_c^2)\log\frac{1}ε)$. Our stochastic gradient computation complexities are the same as the ones of single-machine VR methods, such as SAG, SAGA, and SVRG, and our communication complexities keep the same as those of EXTRA and DIGing, respectively. To further speed up the convergence, we also propose the accelerated VR-EXTRA and VR-DIGing with both the optimal $O((\sqrt{nκ_s}+n)\log\frac{1}ε)$ stochastic gradient computation complexity and $O(\sqrt{κ_bκ_c}\log\frac{1}ε)$ communication complexity. Our stochastic gradient computation complexity is also the same as the ones of single-machine accelerated VR methods, such as Katyusha, and our communication complexity keeps the same as those of accelerated full batch decentralized methods, such as MSDA.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源