一个在线发展的框架，用于推进基于增强学习的自动化车辆控制

论文标题

一个在线发展的框架，用于推进基于增强学习的自动化车辆控制

An online evolving framework for advancing reinforcement-learning based automated vehicle control

论文作者

Han, Teawon, Nageshrao, Subramanya, Filev, Dimitar P., Ozguner, Umit

论文摘要

在本文中，提出了一个在线不断发展的框架，以检测和修改控制器的不完美决策。该框架由三个模块组成：不断发展的有限状态机（E-FSM），动作捕捉器和控制器模块。 E-FSM模块通过确定新状态并重复识别转变概率，从而从头开始演变一个随机模型（例如离散时间马尔可夫链）。借助最新的随机模型和给定标准，动作浏览器模块通过预测未来状态来检查控制器所选动作的有效性。然后，如果所选的动作不合适，则将检查并选择其他操作。为了展示提议的框架的优势，使用在线发展框架的深层确定性策略梯度（DDPG）（ddpg）被应用于在速度和安全性设置控制标准的情况下控制自我车辆。实验结果表明，DDPG控制器选择的不适当动作通过我们提出的框架进行了适当的检测和修订，导致几次迭代后没有控制故障。

In this paper, an online evolving framework is proposed to detect and revise a controller's imperfect decision-making in advance. The framework consists of three modules: the evolving Finite State Machine (e-FSM), action-reviser, and controller modules. The e-FSM module evolves a stochastic model (e.g., Discrete-Time Markov Chain) from scratch by determining new states and identifying transition probabilities repeatedly. With the latest stochastic model and given criteria, the action-reviser module checks validity of the controller's chosen action by predicting future states. Then, if the chosen action is not appropriate, another action is inspected and selected. In order to show the advantage of the proposed framework, the Deep Deterministic Policy Gradient (DDPG) w/ and w/o the online evolving framework are applied to control an ego-vehicle in the car-following scenario where control criteria are set by speed and safety. Experimental results show that inappropriate actions chosen by the DDPG controller are detected and revised appropriately through our proposed framework, resulting in no control failures after a few iterations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题