针对采样系统的稳定增强学习方法，具有部分未知的模型

论文标题

针对采样系统的稳定增强学习方法，具有部分未知的模型

A stabilizing reinforcement learning approach for sampled systems with partially unknown models

论文作者

Beckenbach, Lukas, Osinenko, Pavel, Streif, Stefan

论文摘要

强化学习通常与奖励最大化（或成本量化）代理的培训相关，换句话说是控制者。它可以使用先验或在线收集的系统数据以训练涉及的参数体系结构来以无模型或基于模型的方式应用。通常，除非通过学习限制或量身定制的培训规则采取特殊措施，否则在线增强学习不能保证闭环稳定性。特别有希望的是通过“经典”控制方法进行增强学习的杂种。在这项工作中，我们建议一种在纯粹的在线学习环境中，即没有离线培训的情况下，可以保证系统控制器闭环的实际稳定性。此外，我们仅对系统模型进行部分知识。为了达到要求的结果，我们采用经典自适应控制技术。总体控制方案的实施是在数字，采样设置中明确提供的。也就是说，控制器接收系统的状态，并在离散的时间（特别是等距的时刻）中计算控制动作。该方法在自适应牵引力控制和巡航控制中进行了测试，事实证明，该方法可显着降低成本。

Reinforcement learning is commonly associated with training of reward-maximizing (or cost-minimizing) agents, in other words, controllers. It can be applied in model-free or model-based fashion, using a priori or online collected system data to train involved parametric architectures. In general, online reinforcement learning does not guarantee closed loop stability unless special measures are taken, for instance, through learning constraints or tailored training rules. Particularly promising are hybrids of reinforcement learning with "classical" control approaches. In this work, we suggest a method to guarantee practical stability of the system-controller closed loop in a purely online learning setting, i.e., without offline training. Moreover, we assume only partial knowledge of the system model. To achieve the claimed results, we employ techniques of classical adaptive control. The implementation of the overall control scheme is provided explicitly in a digital, sampled setting. That is, the controller receives the state of the system and computes the control action at discrete, specifically, equidistant moments in time. The method is tested in adaptive traction control and cruise control where it proved to significantly reduce the cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题