论文标题
基于深度图Q-NETWORK(DGQN)的区域范围的交通信号控制以异步训练
Area-wide traffic signal control based on a deep graph Q-Network (DGQN) trained in an asynchronous manner
论文作者
论文摘要
增强学习(RL)算法已广泛应用于交通信号研究。但是,在共同控制大型运输网络的交通信号灯方面存在一些问题。首先,随着要共同控制的交叉点的数量增加,动作空间呈指数爆炸。尽管已使用多代理RL算法来解决维数的诅咒,但这既不能保证全球最佳效果,也无法打破关节作用之间的联系。通过修改单格RL算法的框架内深Q-NetWork(DQN)的输出结构来解决问题。其次,当将交通状态映射到一个动作值中时,很难在大型运输网络上考虑时空相关性。设计了深图Q网络(DGQN),以大规模地有效地适应时空依赖性。最后,培训RL模型以在大型运输网络中共同控制交通信号灯需要大量时间融合。为DGQN设计了一种异步更新方法,以快速达到最佳策略。使用这三种补救措施,DGQN成功地控制了首尔大型运输网络中的交通信号灯。这种方法的表现优于其他最先进的RL算法以及实际的固定信号操作。
Reinforcement learning (RL) algorithms have been widely applied in traffic signal studies. There are, however, several problems in jointly controlling traffic lights for a large transportation network. First, the action space exponentially explodes as the number of intersections to be jointly controlled increases. Although a multi-agent RL algorithm has been used to solve the curse of dimensionality, this neither guaranteed a global optimum, nor could it break the ties between joint actions. The problem was circumvented by revising the output structure of a deep Q-network (DQN) within the framework of a single-agent RL algorithm. Second, when mapping traffic states into an action value, it is difficult to consider spatio-temporal correlations over a large transportation network. A deep graph Q-network (DGQN) was devised to efficiently accommodate spatio-temporal dependencies on a large scale. Finally, training a RL model to jointly control traffic lights in a large transportation network requires much time to converge. An asynchronous update methodology was devised for a DGQN to quickly reach an optimal policy. Using these three remedies, a DGQN succeeded in jointly controlling the traffic lights in a large transportation network in Seoul. This approach outperformed other state-of-the-art RL algorithms as well as an actual fixed-signal operation.