论文标题
用于离散时间线性季度控制的Q学习算法,具有未知分布的随机参数:收敛和稳定
A Q-learning algorithm for discrete-time linear-quadratic control with random parameters of unknown distribution: convergence and stabilization
论文作者
论文摘要
本文研究了离散时间线性系统和二次标准的无限范围最佳控制问题,均具有独立的随机参数,并且相对于时间相同。一种经典的方法是求解涉及数学期望的代数riccati方程,并需要参数的某些统计信息。在本文中,我们本以Q学习的精神提出了一种在线迭代算法,因为在每个时间步骤中只出现一个随机的参数样本。第一个定理证明了三种属性的等效性:学习序列的收敛性,控制问题的良好性以及代数riccati方程的可溶性。第二个定理表明,只要控制问题得到良好,自适应反馈控制就可以稳定系统。提出了数值示例以说明我们的结果。
This paper studies an infinite horizon optimal control problem for discrete-time linear systems and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. A classical approach is to solve an algebraic Riccati equation that involves mathematical expectations and requires certain statistical information of the parameters. In this paper, we propose an online iterative algorithm in the spirit of Q-learning for the situation where only one random sample of parameters emerges at each time step. The first theorem proves the equivalence of three properties: the convergence of the learning sequence, the well-posedness of the control problem, and the solvability of the algebraic Riccati equation. The second theorem shows that the adaptive feedback control in terms of the learning sequence stabilizes the system as long as the control problem is well-posed. Numerical examples are presented to illustrate our results.