非凸的随机优化中的二阶信息：功率和局限性

论文标题

非凸的随机优化中的二阶信息：功率和局限性

Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

论文作者

Arjevani, Yossi, Carmon, Yair, Duchi, John C., Foster, Dylan J., Sekhari, Ayush, Sridharan, Karthik

论文摘要

我们设计了一种算法，该算法使用$ O（均为$ \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \leε$），使用$ O（ε^{ - 3}）$随机梯度和Hessian-vector产品，与以前仅在有强大的访问范围内可用的匹配的保证，并且可以访问多个Queries。我们证明了一个下限，它确定该速率是最佳的，而且 - 令人惊讶的是，即使物镜的第一个$ p $衍生品是Lipschitz，也无法使用随机$ p $ th订单方法对其进行改进。总之，这些结果表征了非凸的随机优化与二阶方法及以后的复杂性。将我们的范围扩展到寻找$（ε，γ）$ - 近似二阶固定点的甲骨文复杂性，我们为随机二阶方法建立了几乎匹配的上和下限。即使在嘈杂的情况下，我们的下界也是新颖的。

We design an algorithm which finds an $ε$-approximate stationary point (with $\|\nabla F(x)\|\le ε$) using $O(ε^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and---surprisingly---that it cannot be improved using stochastic $p$th order methods for any $p\ge 2$, even when the first $p$ derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding $(ε,γ)$-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.

下载PDF全文

下载文献需遵守相关版权规定

论文标题