Coordinated Sequential Optimization for Network-wide Traffic Signal Control Based on Heterogeneous Multi-agent Transformer

被引:0
|
作者
Chen X. [1 ,3 ]
Zhu Y. [2 ]
Xie N. [1 ,3 ]
Geng M. [1 ,3 ]
Lv C. [3 ]
机构
[1] Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou
[2] Institute of Intelligent Transportation Systems, Polytechnic Institute, Zhejiang University, Hangzhou
[3] College of Civil Engineering and Architecture, Zhejiang University, Hangzhou
基金
中国国家自然科学基金;
关键词
deep reinforcement learning; heterogeneous multi-agent; intelligent transportation; network-wide traffic signal control; spatio-temporal pressure reward;
D O I
10.16097/j.cnki.1009-6744.2024.03.012
中图分类号
学科分类号
摘要
Focusing on the complex traffic signal control task in an urban network, this study proposes a coordinated sequential optimization method based on a Heterogeneous Multi-Agent Transformer (HMATLight) to optimize network-wide traffic signals and improve the performance of signal control policy at intersections within the urban network. Specifically, considering the spatial correlation of multi-intersection traffic flow, a value encoder based on a self-attention mechanism is first designed to learn traffic observation representations and realize network-level communication. Secondly, in response to the non-stationary environment for multi-agent policy updates, a policy decoder based on the multi-agent advantage decomposition is constructed, which can sequentially output the optimal responsive action on the basis of the joint actions of preceding agents. Besides, an action-masking mechanism based on effective driving vehicles, adapting the decision frequency within the time-adequate interval, and a spatio-temporal pressure reward function considering the waiting fairness are constructed, which further enhance policy performance and practicality. A series of experiments are carried out on Hangzhou network datasets to validate the effectiveness of the proposed method. Experimental results show that the proposed HMATLight outperforms all baselines on two datasets with five metrics. Compared with the best-performed baseline, HMATLight decreases the average travel time by 10.89%, the average queue length by 18.84% and the average waiting time by 22.21%. Furthermore, HMATLight is dramatically higher in generalization and significantly reduces instances of long vehicle waiting times. © 2024 Science Press. All rights reserved.
引用
收藏
页码:114 / 126
页数:12
相关论文
共 22 条
  • [1] ROBERTSON D I., TRANSYT" method for area traffic control, Traffic Engineering & Control, 10, 6, pp. 181-182, (1969)
  • [2] ROBERTSON D I, BRETHERTON R D., Optimizing networks of traffic signals in real time-the SCOOT method, IEEE Transactions on Vehicular Technology, 40, 1, pp. 11-15, (1991)
  • [3] SIMS A G, DOBINSON K W., The sydney coordinated adaptive traffic (SCAT) system philosophy and benefits, IEEE Transactions on Vehicular Technology, 29, 2, pp. 130-137, (1980)
  • [4] VARAIYA P., Max pressure control of a network of signalized intersections, Transportation Research Part C: Emerging Technologies, 36, pp. 177-195, (2013)
  • [5] MA D F, CHEN X, WU X D, Et al., Mixed-coordinated decision-making method for arterial signals based on reinforcement learning, Journal of Transportation Systems Engineering and Information Technology, 22, 2, pp. 145-153, (2022)
  • [6] WEI H, CHEN C, ZHENG G, Et al., Presslight: Learning max pressure control to coordinate traffic signals in arterial network, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2019)
  • [7] ZHENG G J, XIONG Y H, ZANG X S, Et al., Learning phase competition for traf fi c signal control, Proceedings of the 28th ACM International Conference on Information and Knowledge Management, (2019)
  • [8] CHEN C C, WEI H, XU N, Et al., Toward a thousand lights: Decentralized deep reinforcement learning for large-scale traffic signal control, Proceedings of the AAAI Conference on Artificial Intelligence, (2020)
  • [9] PAPOUDAKIS G, CHRISTIANOS F, RAHMAN A, Et al., Dealing with non-stationarity in multi-agent deep reinforcement learning, 1906, (2019)
  • [10] WEI H, XU N, ZHANG H C, Et al., CoLight: Learning network-level cooperation for traffic signal control, Proceedings of the 28th ACM International Conference on Information and Knowledge Management, (2019)