论文标题
具有多代理图增强学习的高效连接和自动驾驶系统
Efficient Connected and Automated Driving System with Multi-agent Graph Reinforcement Learning
论文作者
论文摘要
连接和自动化的车辆(CAVS)最近引起了越来越多的关注。快速驱动时间使他们有潜力促进整个运输系统的效率和安全性。由于技术挑战,将有一定比例的车辆可以配备自动化,而其他车辆没有自动化。我们专注于如何通过允许每辆自动化的车辆相互学习并规范人类驱动的交通流来改善总运输系统的结果,而不是学习自动自动化车辆的可靠行为。最先进的方法之一是使用强化学习来学习智能决策政策。但是,直接加强学习框架无法提高整个系统的性能。在本文中,我们证明,使用共享策略在多代理设置中的问题可以帮助实现比单代理设置中的非共享策略更好的系统性能。此外,我们发现注意力机制在相互作用特征上的利用可以捕获每个代理之间的相互作用,以促进合作。据我们所知,虽然先前的自动驾驶研究主要集中于提高个人的驾驶绩效,但这项工作是研究系统级多机构合作绩效的起点,使用图形信息共享。我们在跟踪和未信号的交叉设置中进行了广泛的实验。结果表明,通过我们的方法控制的骑士可以在几种最新的基线上实现最佳性能。
Connected and automated vehicles (CAVs) have attracted more and more attention recently. The fast actuation time allows them having the potential to promote the efficiency and safety of the whole transportation system. Due to technical challenges, there will be a proportion of vehicles that can be equipped with automation while other vehicles are without automation. Instead of learning a reliable behavior for ego automated vehicle, we focus on how to improve the outcomes of the total transportation system by allowing each automated vehicle to learn cooperation with each other and regulate human-driven traffic flow. One of state of the art method is using reinforcement learning to learn intelligent decision making policy. However, direct reinforcement learning framework cannot improve the performance of the whole system. In this article, we demonstrate that considering the problem in multi-agent setting with shared policy can help achieve better system performance than non-shared policy in single-agent setting. Furthermore, we find that utilization of attention mechanism on interaction features can capture the interplay between each agent in order to boost cooperation. To the best of our knowledge, while previous automated driving studies mainly focus on enhancing individual's driving performance, this work serves as a starting point for research on system-level multi-agent cooperation performance using graph information sharing. We conduct extensive experiments in car-following and unsignalized intersection settings. The results demonstrate that CAVs controlled by our method can achieve the best performance against several state of the art baselines.