基于开处方表演的基于政策风险敏感的强化学习学习的最佳跟踪控制

论文标题

基于开处方表演的基于政策风险敏感的强化学习学习的最佳跟踪控制

Off Policy Risk Sensitive Reinforcement Learning Based Optimal Tracking Control with Prescribe Performances

论文作者

Li, C., Wang, Y., Liu, F., Buss, M.

论文摘要

开发了基于政策加强学习的控制策略，以实现最佳跟踪控制问题，以在学习过程中实现全州的规定绩效。最佳跟踪控制问题被转换为基于辅助系统的最佳调节问题。规定性能的要求将其转变为约束满意度问题，这些问题由风险敏感的状态罚款条款在优化框架下处理。为了获得汉密尔顿雅各比·贝尔曼方程的近似解决方案，通过一起使用当前数据和体验数据，开发了OFF政策自适应批评者学习体系结构。通过使用经验数据，拟议的权重估计更新法律法律保证了权重收敛到实际价值。该技术与需要合并外部信号以满足体重收敛的激发条件的持续性相比，具有可实用性。提供了闭环系统的稳定性和重量收敛的证明。仿真结果揭示了基于政策风险敏感的加强学习策略的有效性。

An off policy reinforcement learning based control strategy is developed for the optimal tracking control problem to achieve the prescribed performance of full states during the learning process. The optimal tracking control problem is converted as an optimal regulation problem based on an auxiliary system. The requirements of prescribed performances are transformed into constraint satisfaction problems that are dealt with by risk sensitive state penalty terms under an optimization framework. To get approximated solutions of the Hamilton Jacobi Bellman equation, an off policy adaptive critic learning architecture is developed by using current data and experience data together. By using experience data, the proposed weight estimation update law of the critic learning agent guarantees weight convergence to the actual value. This technique enjoys practicability comparing with common methods that need to incorporate external signals to satisfy the persistence of excitation condition for weight convergence. The proofs of stability and weight convergence of the closed loop system are provided. Simulation results reveal the validity of the proposed off policy risk sensitive reinforcement learning based control strategy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题