通过轻松的线性程序，数据驱动的最佳控制

论文标题

通过轻松的线性程序，数据驱动的最佳控制

Data-driven optimal control with a relaxed linear program

论文作者

Martinelli, Andrea, Gargiani, Matilde, Lygeros, John

论文摘要

线性编程（LP）方法在近似动态编程理论中具有悠久的历史。但是，在计算方面，LP方法通常具有较差的可伸缩性。在这项工作中，我们介绍了Bellman操作员的轻松版本，以供Q-功能，并证明它仍然是具有独特固定点的单调收缩映射。本着LP方法的精神，我们利用新操作员来构建轻松的线性程序（RLP）。与标准LP公式相比，我们的RLP只有一个约束族和一半的决策变量，使其更可扩展和计算效率。对于确定性系统，RLP琐碎地返回正确的Q功能。对于连续空间中的随机线性系统，RLP的解决方案保留了最佳Q功能的最小化器，因此可以检索最佳策略。理论结果在模拟中得到了备份，在该模拟中，我们通过与环境相互作用来求解LPS的采样版本。对于一般的非线性系统，我们观察到RLP再次倾向于保留LP的最小化解决方案，尽管相对性能受到问题的特定几何形状的影响。

The linear programming (LP) approach has a long history in the theory of approximate dynamic programming. When it comes to computation, however, the LP approach often suffers from poor scalability. In this work, we introduce a relaxed version of the Bellman operator for q-functions and prove that it is still a monotone contraction mapping with a unique fixed point. In the spirit of the LP approach, we exploit the new operator to build a relaxed linear program (RLP). Compared to the standard LP formulation, our RLP has only one family of constraints and half the decision variables, making it more scalable and computationally efficient. For deterministic systems, the RLP trivially returns the correct q-function. For stochastic linear systems in continuous spaces, the solution to the RLP preserves the minimizer of the optimal q-function, hence retrieves the optimal policy. Theoretical results are backed up in simulation where we solve sampled versions of the LPs with data collected by interacting with the environment. For general nonlinear systems, we observe that the RLP again tends to preserve the minimizers of the solution to the LP, though the relative performance is influenced by the specific geometry of the problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题