使用加固学习的自动驾驶汽车的驾驶自适应保护措施

论文标题

使用加固学习的自动驾驶汽车的驾驶自适应保护措施

Driving-Policy Adaptive Safeguard for Autonomous Vehicles Using Reinforcement Learning

论文作者

Cao, Zhong, Xu, Shaobing, Zhang, Songan, Peng, Huei, Yang, Diange

论文摘要

诸如高级紧急制动（AEB）提供的保障功能可以为自动驾驶汽车（AV）提供另一层安全性。智能保障功能应将激活条件适应驾驶政策，以避免不必要的干预措施并提高车辆安全性。本文提出了驾驶驾驶自适应保障（DPA）设计，包括避免碰撞策略和激活功能。避免碰撞策略是在通过蒙特卡洛树搜索（MCT）获得的增强学习框架中设计的。它可以从过去的碰撞中学习，并在随机贩运中操纵制动和转向。驾驶自适应激活功能应动态评估当前的驾驶政策风险，并在检测到紧急威胁时启动。为了生成此激活功能，MCT的探索和推出模块旨在完全评估AV当前的驾驶策略，然后探索其他更安全的操作。在这项研究中，DPA通过两个典型的高速公路驾驶政策进行了验证。在随机和积极的模拟流量中，结果是通过90,000次获得的。自然主义驾驶数据对结果进行了校准，并表明所提出的保障措施可大大降低碰撞率，而没有引入更多的干预措施，而不是州基的基准保护措施。总而言之，拟议的保护措施在随机和紧急情况下利用基于学习的方法，并对驾驶政策施加最小的影响。

Safeguard functions such as those provided by advanced emergency braking (AEB) can provide another layer of safety for autonomous vehicles (AV). A smart safeguard function should adapt the activation conditions to the driving policy, to avoid unnecessary interventions as well as improve vehicle safety. This paper proposes a driving-policy adaptive safeguard (DPAS) design, including a collision avoidance strategy and an activation function. The collision avoidance strategy is designed in a reinforcement learning framework, obtained by Monte-Carlo Tree Search (MCTS). It can learn from past collisions and manipulate both braking and steering in stochastic traffics. The driving-policy adaptive activation function should dynamically assess current driving policy risk and kick in when an urgent threat is detected. To generate this activation function, MCTS' exploration and rollout modules are designed to fully evaluate the AV's current driving policy, and then explore other safer actions. In this study, the DPAS is validated with two typical highway-driving policies. The results are obtained through and 90,000 times in the stochastic and aggressive simulated traffic. The results are calibrated by naturalistic driving data and show that the proposed safeguard reduces the collision rate significantly without introducing more interventions, compared with the state-based benchmark safeguards. In summary, the proposed safeguard leverages the learning-based method in stochastic and emergent scenarios and imposes minimal influence on the driving policy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题