论文标题
遵循扰动的领导者:光滑最小游戏的乐观和快速并行算法
Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games
论文作者
论文摘要
我们考虑在线学习的问题及其在求解最小游戏中的应用。对于在线学习问题,遵循扰动的领导者(FTPL)是一种经过广泛研究的算法,享受最佳的$ O(t^{1/2})$ worst-case的遗憾保证了凸和非convex损失。在这项工作中,我们表明,当损失函数的顺序是可以预测的时,FTPL的简单修改可以实现更好的遗憾保证,同时保留对不可预测的序列的最佳最坏情况后悔保证。获得这些更紧密的遗憾界限的关键挑战是算法中的随机性和乐观性,该算法与FTPL分析中常用的分析技术需要不同的分析技术。我们在分析中使用的关键成分是扰动作为正则化的双重视图。尽管我们的算法有多个应用程序,但我们考虑了Minimax游戏的特定应用。为了求解平滑的凸连接游戏,我们的算法仅需要访问线性优化的Oracle。对于Lipschitz和平滑的非Convex-Nonconcave游戏,我们的算法需要访问优化的Oracle,该优化甲骨文计算出扰动的最佳响应。在这两个设置中,我们的算法使用$ o(t^{ - 1/2})$的准确度使用$ t $呼叫的优化甲骨文的精度。我们算法的一个重要特征是它是高度可行的,仅需要$ o(t^{1/2})$迭代,每次迭代都会制作$ O(t^{1/2})$并行调用优化甲骨文。
We consider the problem of online learning and its application to solving minimax games. For the online learning problem, Follow the Perturbed Leader (FTPL) is a widely studied algorithm which enjoys the optimal $O(T^{1/2})$ worst-case regret guarantee for both convex and nonconvex losses. In this work, we show that when the sequence of loss functions is predictable, a simple modification of FTPL which incorporates optimism can achieve better regret guarantees, while retaining the optimal worst-case regret guarantee for unpredictable sequences. A key challenge in obtaining these tighter regret bounds is the stochasticity and optimism in the algorithm, which requires different analysis techniques than those commonly used in the analysis of FTPL. The key ingredient we utilize in our analysis is the dual view of perturbation as regularization. While our algorithm has several applications, we consider the specific application of minimax games. For solving smooth convex-concave games, our algorithm only requires access to a linear optimization oracle. For Lipschitz and smooth nonconvex-nonconcave games, our algorithm requires access to an optimization oracle which computes the perturbed best response. In both these settings, our algorithm solves the game up to an accuracy of $O(T^{-1/2})$ using $T$ calls to the optimization oracle. An important feature of our algorithm is that it is highly parallelizable and requires only $O(T^{1/2})$ iterations, with each iteration making $O(T^{1/2})$ parallel calls to the optimization oracle.