平衡在线探索在线退缩学习控制控制与可证明的遗憾保证

论文标题

平衡在线探索在线退缩学习控制控制与可证明的遗憾保证

Balancing Exploration for Online Receding Horizon Learning Control with Provable Regret Guarantees

论文作者

Muthirayan, Deepan, Yuan, Jianjun, Khargonekar, Pramod P.

论文摘要

我们解决了在线退化的视野控制设置中同时学习和控制的问题。我们考虑对控制输入的一般成本功能和仿射约束的未知线性动力学系统的控制。我们的目标是开发一种在线学习算法，该算法最大程度地减少了动态遗憾，该算法定义为算法产生的累积成本与最佳政策的累积成本之间的差异，并充分了解系统，成本功能和状态，并满足控制输入约束。我们提出了一种新颖的方法，可以在网上退化的地平线环境中进行探索。关键的挑战是确保由退化的地平线控制器产生的控制持续令人兴奋。我们的方法是将扰动应用于由勘探探索折衷权衡的后退视野控制器产生的控制输入。通过探索保证子线性遗憾的条件，我们表明，提议的控制器的性能由$ \ tilde {\ Mathcal {o}}}（t^{3/4}）$限制，因为当控制器的成本预览以下尺寸时，遗憾和累积的约束违规都违反了一个尺寸。

We address the problem of simultaneously learning and control in an online receding horizon control setting. We consider the control of an unknown linear dynamical system with general cost functions and affine constraints on the control input. Our goal is to develop an online learning algorithm that minimizes the dynamic regret, which is defined as the difference between the cumulative cost incurred by the algorithm and that of the best policy with full knowledge of the system, cost functions and state and that satisfies the control input constraints. We propose a novel approach to explore in an online receding horizon setting. The key challenge is to ensure that the control generated by the receding horizon controller is persistently exciting. Our approach is to apply a perturbation to the control input generated by the receding horizon controller that balances the exploration-exploitation trade-off. By exploring the conditions under which sub-linear regret is guaranteed, We show that the proposed controller's performance is upper bounded by $\tilde{\mathcal{O}}(T^{3/4})$ for both regret and cumulative constraint violation when the controller has preview of the cost functions for the interval that doubles in size from one interval to the next.

下载PDF全文

下载文献需遵守相关版权规定

论文标题