论文标题
预测的随机梯度Langevin算法用于约束采样和非凸学习
Projected Stochastic Gradient Langevin Algorithms for Constrained Sampling and Non-Convex Learning
论文作者
论文摘要
Langevin算法是具有加性噪声的梯度下降方法。它们已在马尔可夫链蒙特卡洛(MCMC)采样,优化和学习中使用了数十年。在过去的几年中,对无约束的非凸优化和学习问题的收敛属性进行了广泛的研究。其他工作已经检查了投影的Langevin算法,用于从限制在凸形紧凑型集合的对数凸线分布的采样中。为了学习和优化,对数符号分布对应于凸损失。在本文中,我们分析了具有紧凑的凸约约束集和IID外部数据变量的非凸损失的情况。我们将所得的方法称为投影的随机梯度langevin算法(PSGLA)。我们显示该算法的偏差为$ O(T^{ - 1/4}(\ log t)^{1/2})$从其目标分布中的1-Wasserstein距离。为了优化和学习,我们表明该算法平均达到了$ε$ -suboptimal的解决方案,前提是它在$ε^{ - 1} $中是多项式运行的时间,而问题维度则略有指数。
Langevin algorithms are gradient descent methods with additive noise. They have been used for decades in Markov chain Monte Carlo (MCMC) sampling, optimization, and learning. Their convergence properties for unconstrained non-convex optimization and learning problems have been studied widely in the last few years. Other work has examined projected Langevin algorithms for sampling from log-concave distributions restricted to convex compact sets. For learning and optimization, log-concave distributions correspond to convex losses. In this paper, we analyze the case of non-convex losses with compact convex constraint sets and IID external data variables. We term the resulting method the projected stochastic gradient Langevin algorithm (PSGLA). We show the algorithm achieves a deviation of $O(T^{-1/4}(\log T)^{1/2})$ from its target distribution in 1-Wasserstein distance. For optimization and learning, we show that the algorithm achieves $ε$-suboptimal solutions, on average, provided that it is run for a time that is polynomial in $ε^{-1}$ and slightly super-exponential in the problem dimension.