遗憾的分析确定性等效政策连续时间线性季度系统

论文标题

遗憾的分析确定性等效政策连续时间线性季度系统

Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

论文作者

Faradonbeh, Mohamad Kazem Shirani

论文摘要

理论上，这项工作研究了一种无处不在的增强学习政策，用于控制连续时间随机线性季节系统的规范模型。我们表明，随机确定性等效策略解决了根据未知随机微分方程进化的线性控制系统中的探索 - 开发困境，其操作成本是二次的。更确切地说，我们建立了时间段的平方根遗憾界限，表明随机确定性等效策略可以从单个状态轨迹中快速学习最佳控制动作。此外，显示了与参数数量的线性缩放。提出的分析介绍了新颖而有用的技术方法，并阐明了连续时间增强学习的基本挑战。

This work theoretically studies a ubiquitous reinforcement learning policy for controlling the canonical model of continuous-time stochastic linear-quadratic systems. We show that randomized certainty equivalent policy addresses the exploration-exploitation dilemma in linear control systems that evolve according to unknown stochastic differential equations and their operating cost is quadratic. More precisely, we establish square-root of time regret bounds, indicating that randomized certainty equivalent policy learns optimal control actions fast from a single state trajectory. Further, linear scaling of the regret with the number of parameters is shown. The presented analysis introduces novel and useful technical approaches, and sheds light on fundamental challenges of continuous-time reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题