匪徒反馈的非策略控制

论文标题

匪徒反馈的非策略控制

Non-Stochastic Control with Bandit Feedback

论文作者

Gradu, Paula, Hallman, John, Hazan, Elad

论文摘要

我们研究了控制线性动力学系统，具有对抗性扰动的问题，其中控制器可用的唯一反馈是标量损耗，而损耗函数本身却未知。对于这个问题，使用已知或未知系统，我们给出了有效的sublerear后悔算法。主要的算法难度是损失对过去控制的依赖性。为了克服这个问题，我们提出了一种有效的算法，用于对具有内存的损失函数的总体设置优化，这可能是独立的。

We study the problem of controlling a linear dynamical system with adversarial perturbations where the only feedback available to the controller is the scalar loss, and the loss function itself is unknown. For this problem, with either a known or unknown system, we give an efficient sublinear regret algorithm. The main algorithmic difficulty is the dependence of the loss on past controls. To overcome this issue, we propose an efficient algorithm for the general setting of bandit convex optimization for loss functions with memory, which may be of independent interest.

下载PDF全文

下载文献需遵守相关版权规定

论文标题