甲骨文的下限，用于随机梯度采样算法

论文标题

甲骨文的下限，用于随机梯度采样算法

Oracle Lower Bounds for Stochastic Gradient Sampling Algorithms

论文作者

Chatterji, Niladri S., Bartlett, Peter L., Long, Philip M.

论文摘要

我们考虑从$ \ mathbb {r}^d $中强烈的log-concove密度进行采样的问题，并证明了所需对数密度的随机梯度查询数量的信息理论下限。几种流行的采样算法（包括许多马尔可夫链蒙特卡洛方法）通过使用对数密度的随机梯度来生成样品。我们的结果为所有这些算法建立了信息理论限制。我们表明，对于每种算法，都存在一个有很好的结合对数符合目标密度，该算法产生的点的分布至少将是$ \ varepsilon $在总变化范围内从目标范围内的$ \ varepsilon $，如果梯度查询的数量小于$ω（n v d/f varepsilon^2 d/$ varepsil^2），$ \ varepsil^2）坡度。我们的下限是结合了统计实验比较的LE CAM缺乏症的想法以及在下边界贝叶斯风险功能中使用的标准信息理论工具。据我们所知，我们的结果为这个问题提供了第一个非依赖维度的下限。

We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove an information theoretic lower bound on the number of stochastic gradient queries of the log density needed. Several popular sampling algorithms (including many Markov chain Monte Carlo methods) operate by using stochastic gradients of the log density to generate a sample; our results establish an information theoretic limit for all these algorithms. We show that for every algorithm, there exists a well-conditioned strongly log-concave target density for which the distribution of points generated by the algorithm would be at least $\varepsilon$ away from the target in total variation distance if the number of gradient queries is less than $Ω(σ^2 d/\varepsilon^2)$, where $σ^2 d$ is the variance of the stochastic gradient. Our lower bound follows by combining the ideas of Le Cam deficiency routinely used in the comparison of statistical experiments along with standard information theoretic tools used in lower bounding Bayes risk functions. To the best of our knowledge our results provide the first nontrivial dimension-dependent lower bound for this problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题