论文标题
通过剩余反馈来提高单点无导数的在线优化
Boosting One-Point Derivative-Free Online Optimization via Residual Feedback
论文作者
论文摘要
零阶优化(ZO)通常依靠两点反馈来估计目标函数的未知梯度。然而,两点反馈不能用于在线优化时变目标函数,在每个时间步骤中,只有单个查询功能值的查询。在这项工作中,我们提出了一种新的单点反馈方法,用于在线优化,该方法在连续时间时使用两个反馈点之间的残差来估算目标函数梯度。此外,我们为ZO开发了遗憾的界限,并为凸面和非Convex在线优化问题提供了剩余的反馈。具体而言,对于确定性和随机问题,对于Lipschitz和光滑的目标函数,我们表明,与常规的单点反馈方法相比,使用剩余反馈可以产生具有较小差异的梯度估计值。结果,与常规的单点反馈的现有遗憾界面相比,我们的遗憾界限更加紧密,这表明带有剩余反馈的ZO可以更好地跟踪在线优化问题的优化器。此外,我们的遗憾界限依赖于常规单点反馈方法中使用的假设较弱。数值实验表明,剩余反馈的ZO在实践中也显着优于现有的单点反馈方法。
Zeroth-order optimization (ZO) typically relies on two-point feedback to estimate the unknown gradient of the objective function. Nevertheless, two-point feedback can not be used for online optimization of time-varying objective functions, where only a single query of the function value is possible at each time step. In this work, we propose a new one-point feedback method for online optimization that estimates the objective function gradient using the residual between two feedback points at consecutive time instants. Moreover, we develop regret bounds for ZO with residual feedback for both convex and nonconvex online optimization problems. Specifically, for both deterministic and stochastic problems and for both Lipschitz and smooth objective functions, we show that using residual feedback can produce gradient estimates with much smaller variance compared to conventional one-point feedback methods. As a result, our regret bounds are much tighter compared to existing regret bounds for ZO with conventional one-point feedback, which suggests that ZO with residual feedback can better track the optimizer of online optimization problems. Additionally, our regret bounds rely on weaker assumptions than those used in conventional one-point feedback methods. Numerical experiments show that ZO with residual feedback significantly outperforms existing one-point feedback methods also in practice.