使用分层增强学习的自动驾驶汽车轨迹计划

论文标题

使用分层增强学习的自动驾驶汽车轨迹计划

Trajectory Planning for Autonomous Vehicles Using Hierarchical Reinforcement Learning

论文作者

Naveed, Kaleb Ben, Qiao, Zhiqian, Dolan, John M.

论文摘要

规划不确定和动态条件下的安全轨迹使自主驾驶问题显着复杂。由于较高的计算成本，诸如快速探索随机树（RRT）之类的当前基于采样的方法（RRT）并不理想。诸如模仿学习之类的监督学习方法缺乏概括和安全保证。为了解决这些问题并为了确保稳健的框架，我们提出了一个分层增强学习（HRL）结构，并结合了成比例的综合衍生（PID）控制器来进行轨迹计划。 HRL有助于将自动驾驶汽车的任务划分为子目标，并支持网络学习高级选项和低级轨迹计划者选择的政策。子目标的引入减少了融合时间，并使学会的政策能够重新用于其他情况。此外，通过保证平滑轨迹和处理自我卡车的嘈杂感知系统，提出的计划者可以做到鲁棒。 PID控制器用于跟踪路点，以确保平滑轨迹并减少混蛋。通过使用网络中的长期记忆（LSTM）层来解决不完整观察的问题。高保真性CARLA模拟器的结果表明，所提出的方法减少了收敛时间，生成更平滑的轨迹，并能够处理动态的环境和嘈杂的观测值。

Planning safe trajectories under uncertain and dynamic conditions makes the autonomous driving problem significantly complex. Current sampling-based methods such as Rapidly Exploring Random Trees (RRTs) are not ideal for this problem because of the high computational cost. Supervised learning methods such as Imitation Learning lack generalization and safety guarantees. To address these problems and in order to ensure a robust framework, we propose a Hierarchical Reinforcement Learning (HRL) structure combined with a Proportional-Integral-Derivative (PID) controller for trajectory planning. HRL helps divide the task of autonomous vehicle driving into sub-goals and supports the network to learn policies for both high-level options and low-level trajectory planner choices. The introduction of sub-goals decreases convergence time and enables the policies learned to be reused for other scenarios. In addition, the proposed planner is made robust by guaranteeing smooth trajectories and by handling the noisy perception system of the ego-car. The PID controller is used for tracking the waypoints, which ensures smooth trajectories and reduces jerk. The problem of incomplete observations is handled by using a Long-Short-Term-Memory (LSTM) layer in the network. Results from the high-fidelity CARLA simulator indicate that the proposed method reduces convergence time, generates smoother trajectories, and is able to handle dynamic surroundings and noisy observations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题