基于深度图Q-NETWORK（DGQN）的区域范围的交通信号控制以异步训练

论文标题

基于深度图Q-NETWORK（DGQN）的区域范围的交通信号控制以异步训练

Area-wide traffic signal control based on a deep graph Q-Network (DGQN) trained in an asynchronous manner

论文作者

Kim, Gyeongjun, Sohn, Keemin

论文摘要

增强学习（RL）算法已广泛应用于交通信号研究。但是，在共同控制大型运输网络的交通信号灯方面存在一些问题。首先，随着要共同控制的交叉点的数量增加，动作空间呈指数爆炸。尽管已使用多代理RL算法来解决维数的诅咒，但这既不能保证全球最佳效果，也无法打破关节作用之间的联系。通过修改单格RL算法的框架内深Q-NetWork（DQN）的输出结构来解决问题。其次，当将交通状态映射到一个动作值中时，很难在大型运输网络上考虑时空相关性。设计了深图Q网络（DGQN），以大规模地有效地适应时空依赖性。最后，培训RL模型以在大型运输网络中共同控制交通信号灯需要大量时间融合。为DGQN设计了一种异步更新方法，以快速达到最佳策略。使用这三种补救措施，DGQN成功地控制了首尔大型运输网络中的交通信号灯。这种方法的表现优于其他最先进的RL算法以及实际的固定信号操作。

Reinforcement learning (RL) algorithms have been widely applied in traffic signal studies. There are, however, several problems in jointly controlling traffic lights for a large transportation network. First, the action space exponentially explodes as the number of intersections to be jointly controlled increases. Although a multi-agent RL algorithm has been used to solve the curse of dimensionality, this neither guaranteed a global optimum, nor could it break the ties between joint actions. The problem was circumvented by revising the output structure of a deep Q-network (DQN) within the framework of a single-agent RL algorithm. Second, when mapping traffic states into an action value, it is difficult to consider spatio-temporal correlations over a large transportation network. A deep graph Q-network (DGQN) was devised to efficiently accommodate spatio-temporal dependencies on a large scale. Finally, training a RL model to jointly control traffic lights in a large transportation network requires much time to converge. An asynchronous update methodology was devised for a DGQN to quickly reach an optimal policy. Using these three remedies, a DGQN succeeded in jointly controlling the traffic lights in a large transportation network in Seoul. This approach outperformed other state-of-the-art RL algorithms as well as an actual fixed-signal operation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题