随时学习的最佳控制：玩具问题

论文标题

随时学习的最佳控制：玩具问题

Optimal control with learning on the fly: a toy problem

论文作者

Fefferman, Charles L., Pegueroles, Bernat Guillen, Rowley, Clarence W., Weber, Melanie

论文摘要

我们为一个简单的玩具问题展示了最佳的控制策略，其中潜在的动态取决于最初未知的参数，必须学习。我们考虑在有限的时间间隔内提出的成本函数，与以前的许多工作相反，将渐近性视为时间范围倾向于无穷大。我们研究了该问题的几种不同版本，包括贝叶斯控制，其中我们假设未知参数的先前分布。和“不可知论”控制，其中我们对未知参数一无所知。对于不可知论问题，我们将我们的绩效与知道参数价值的对手的绩效进行比较。这种比较引起了几个“遗憾”的概念，我们获得了最大程度地减少来自最不知名的参数的最不利选择而引起的“最糟糕的遗憾”的策略。在每种情况下，最佳策略都是贝叶斯战略或贝叶斯策略的极限。

We exhibit optimal control strategies for a simple toy problem in which the underlying dynamics depend on a parameter that is initially unknown and must be learned. We consider a cost function posed over a finite time interval, in contrast to much previous work that considers asymptotics as the time horizon tends to infinity. We study several different versions of the problem, including Bayesian control, in which we assume a prior distribution on the unknown parameter; and "agnostic" control, in which we assume nothing about the unknown parameter. For the agnostic problems, we compare our performance with that of an opponent who knows the value of the parameter. This comparison gives rise to several notions of "regret," and we obtain strategies that minimize the "worst-case regret" arising from the most unfavorable choice of the unknown parameter. In every case, the optimal strategy turns out to be a Bayesian strategy or a limit of Bayesian strategies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题