与网络系统中结构化反馈的风险约束线性二次调节器的无模型学习

论文标题

与网络系统中结构化反馈的风险约束线性二次调节器的无模型学习

Model-free Learning for Risk-constrained Linear Quadratic Regulator with Structured Feedback in Networked Systems

论文作者

Kwon, Kyung-bin, Ye, Lintao, Gupta, Vijay, Zhu, Hao

论文摘要

我们开发了一种无限制的学习算法，用于无限 - 马线线性二次调节器（LQR）问题。具体而言，考虑（风险）约束和结构化反馈，以减少状态偏差，同时允许在实践中稀疏通信图。通过将双重问题重新定义为非convex-concave minimax问题，我们采用了梯度下降max-oracle（GDMAX），对于模态设置，使用零级策略梯度的随机gdmax。通过使用特定定义的级别式集合来界定LIPSCHITZ和LQR成本的平滑性常数，我们可以设计步骤尺寸和相关参数，以建立与固定点的收敛（以很高的概率）。网络微电网控制问题中的数值测试已验证了我们提出的SGDMAX算法的收敛性，同时证明了风险约束的有效性。与经典的LQR对照相比，SGDMAX算法达到了令人满意的最佳差距，尤其是对于全反馈案例。

We develop a model-free learning algorithm for the infinite-horizon linear quadratic regulator (LQR) problem. Specifically, (risk) constraints and structured feedback are considered, in order to reduce the state deviation while allowing for a sparse communication graph in practice. By reformulating the dual problem as a nonconvex-concave minimax problem, we adopt the gradient descent max-oracle (GDmax), and for modelfree setting, the stochastic (S)GDmax using zero-order policy gradient. By bounding the Lipschitz and smoothness constants of the LQR cost using specifically defined sublevel sets, we can design the stepsize and related parameters to establish convergence to a stationary point (at a high probability). Numerical tests in a networked microgrid control problem have validated the convergence of our proposed SGDmax algorithm while demonstrating the effectiveness of risk constraints. The SGDmax algorithm has attained a satisfactory optimality gap compared to the classical LQR control, especially for the full feedback case.

下载PDF全文

下载文献需遵守相关版权规定

论文标题