双重化，分裂，随机化：朝着快速非滑动优化算法

论文标题

双重化，分裂，随机化：朝着快速非滑动优化算法

Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms

论文作者

Salim, Adil, Condat, Laurent, Mishchenko, Konstantin, Richtárik, Peter

论文摘要

我们考虑将三个凸功能的总和最小化，其中第一个f是光滑的，第二个f是非平滑且可近的，第三个是与线性操作员l的非平滑近似函数的组成。此模板问题具有许多应用程序，例如，在图像处理和机器学习中。首先，我们为此问题提出了一种新的原始二偶算法，我们称之为pddy。 It is constructed by applying Davis-Yin splitting to a monotone inclusion in a primal-dual product space, where the operators are monotone under a specific metric depending on L. We show that three existing algorithms (the two forms of the Condat-Vu algorithm and the PD3O algorithm) have the same structure, so that PDDY is the fourth missing link in this self-consistent class of primal-dual算法。这种表示可以简化收敛分析：它使我们能够总体上得出肌收敛速率，而线性收敛导致存在强凸度的存在。此外，在我们的广泛而灵活的分析框架内，我们提出了对算法的新随机概括，其中使用了差异降低F梯度的随机估计值，而不是真正的梯度。此外，我们作为pddy的特殊情况获得了一种线性收敛算法，用于在线性约束下最小化强凸功能f。我们讨论了其对分散优化的重要应用。

We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L. This template problem has many applications, for instance, in image processing and machine learning. First, we propose a new primal-dual algorithm, which we call PDDY, for this problem. It is constructed by applying Davis-Yin splitting to a monotone inclusion in a primal-dual product space, where the operators are monotone under a specific metric depending on L. We show that three existing algorithms (the two forms of the Condat-Vu algorithm and the PD3O algorithm) have the same structure, so that PDDY is the fourth missing link in this self-consistent class of primal-dual algorithms. This representation eases the convergence analysis: it allows us to derive sublinear convergence rates in general, and linear convergence results in presence of strong convexity. Moreover, within our broad and flexible analysis framework, we propose new stochastic generalizations of the algorithms, in which a variance-reduced random estimate of the gradient of F is used, instead of the true gradient. Furthermore, we obtain, as a special case of PDDY, a linearly converging algorithm for the minimization of a strongly convex function F under a linear constraint; we discuss its important application to decentralized optimization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题