在具有嘈杂反馈的游戏中无需重新学习：通过学习速度分离的速度和适应性更快

论文标题

在具有嘈杂反馈的游戏中无需重新学习：通过学习速度分离的速度和适应性更快

No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation

论文作者

Hsieh, Yu-Guan, Antonakopoulos, Kimon, Cevher, Volkan, Mertikopoulos, Panayotis

论文摘要

当学习者与其他优化代理进行连续游戏时，我们研究了遗憾最小化的问题：在这种情况下，如果所有玩家都遵循一种无重组算法，则相对于完全的对抗环境，可能会达到较低的遗憾。我们在变异稳定的游戏（包括所有凸孔和单调游戏的连续游戏）的背景下研究了这个问题，当玩家只能访问其个人回报梯度时。如果噪音是加性的，那么游戏理论和纯粹的对抗性设置也会获得类似的遗憾保证。但是，如果噪声是乘法的，我们表明学习者实际上可以持续遗憾。我们通过学习速率分离的乐观梯度方案实现了更快的速度 - 也就是说，根据噪声配置文件，该方法的外推和更新步骤被调整为不同的时间表。随后，为了消除对精致的超参数调整的需求，我们提出了一种完全自适应的方法，该方法的保证与其非适应性对应物的保证几乎相同，而在不了解游戏或噪声配置文件的情况下进行操作。

We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all convex-concave and monotone games), and when the players only have access to noisy estimates of their individual payoff gradients. If the noise is additive, the game-theoretic and purely adversarial settings enjoy similar regret guarantees; however, if the noise is multiplicative, we show that the learners can, in fact, achieve constant regret. We achieve this faster rate via an optimistic gradient scheme with learning rate separation -- that is, the method's extrapolation and update steps are tuned to different schedules, depending on the noise profile. Subsequently, to eliminate the need for delicate hyperparameter tuning, we propose a fully adaptive method that attains nearly the same guarantees as its non-adapted counterpart, while operating without knowledge of either the game or of the noise profile.

下载PDF全文

下载文献需遵守相关版权规定

论文标题