论文标题
RandProx:带随机近端更新的原始偶偶式优化算法
RandProx: Primal-Dual Optimization Algorithms with Randomized Proximal Updates
论文作者
论文摘要
近端分裂算法非常适合解决大规模的非平滑优化问题,尤其是在机器学习中出现的问题。我们提出了一种新的原始偶算法,其中双重更新是随机的。同等地,问题中其中一个功能之一的接近算子被随机甲骨文取代。例如,在每次迭代中都会更新一些随机选择的双变量,而不是所有变量。或者,函数的接近算子仅使用一些小概率调用。实施了一种非平滑方差减少技术,以便该算法找到涉及平滑和非平滑函数的一般问题的确切最小化器,该函数可能由线性操作员组成。我们得出线性收敛导致存在强凸度的存在。即使在确定性的情况下,当我们的算法恢复为最近提出的原始戴维斯·杨算法时,这些结果也是新的。文献的一些随机算法也被恢复为特定情况(例如,点萨加)。但是我们的随机化技术是一般的,并且包含除了采样和概率更新(包括压缩)之外的许多公正机制。由于收敛速度取决于原始和双重收缩机制中最慢的速度,因此使用随机性时迭代复杂性可能保持不变。另一方面,计算复杂性可以显着降低。总体而言,随机性有助于获得更快的算法。长期以来,这已经以随机培训型算法而闻名,我们的工作表明,这也完全适用于更通用的原始偶型设置。
Proximal splitting algorithms are well suited to solving large-scale nonsmooth optimization problems, in particular those arising in machine learning. We propose a new primal-dual algorithm, in which the dual update is randomized; equivalently, the proximity operator of one of the function in the problem is replaced by a stochastic oracle. For instance, some randomly chosen dual variables, instead of all, are updated at each iteration. Or, the proximity operator of a function is called with some small probability only. A nonsmooth variance-reduction technique is implemented so that the algorithm finds an exact minimizer of the general problem involving smooth and nonsmooth functions, possibly composed with linear operators. We derive linear convergence results in presence of strong convexity; these results are new even in the deterministic case, when our algorithms reverts to the recently proposed Primal-Dual Davis-Yin algorithm. Some randomized algorithms of the literature are also recovered as particular cases (e.g., Point-SAGA). But our randomization technique is general and encompasses many unbiased mechanisms beyond sampling and probabilistic updates, including compression. Since the convergence speed depends on the slowest among the primal and dual contraction mechanisms, the iteration complexity might remain the same when randomness is used. On the other hand, the computation complexity can be significantly reduced. Overall, randomness helps getting faster algorithms. This has long been known for stochastic-gradient-type algorithms, and our work shows that this fully applies in the more general primal-dual setting as well.