使用基于政策的强化学习和优化后的车道合并

论文标题

使用基于政策的强化学习和优化后的车道合并

Lane-Merging Using Policy-based Reinforcement Learning and Post-Optimization

论文作者

Hart, Patrick, Rychly, Leonard, Knol, Alois

论文摘要

许多当前的行为生成方法难以处理现实世界的交通情况，因为它们的复杂性不佳。但是，可以使用数据驱动的方法离线学习行为。特别是，强化学习是有希望的，因为它隐含地学习了如何利用所收集的经验。在这项工作中，我们将基于政策的强化学习与本地优化相结合，以促进和综合两种方法中的最佳。基于政策的增强学习算法为后优化提供了初始解决方案和指导参考。因此，优化器只需要计算单个同质拷贝类，例如\ \在另一个车辆的前面或前方驱动。通过在加强学习过程中存储状态历史，它可以用于约束检查，优化器可以说明交互。优化后的效果还充当安全层，因此可以将新方法应用于安全至关重要的应用中。我们使用车道变化方案评估了所提出的方法，其车辆数量不同。

Many current behavior generation methods struggle to handle real-world traffic situations as they do not scale well with complexity. However, behaviors can be learned off-line using data-driven approaches. Especially, reinforcement learning is promising as it implicitly learns how to behave utilizing collected experiences. In this work, we combine policy-based reinforcement learning with local optimization to foster and synthesize the best of the two methodologies. The policy-based reinforcement learning algorithm provides an initial solution and guiding reference for the post-optimization. Therefore, the optimizer only has to compute a single homotopy class, e.g.\ drive behind or in front of the other vehicle. By storing the state-history during reinforcement learning, it can be used for constraint checking and the optimizer can account for interactions. The post-optimization additionally acts as a safety-layer and the novel method, thus, can be applied in safety-critical applications. We evaluate the proposed method using lane-change scenarios with a varying number of vehicles.

下载PDF全文

下载文献需遵守相关版权规定

论文标题