Coordinated Sequential Optimization for Network-wide Traffic Signal Control Based on Heterogeneous Multi-agent Transformer

被引：0

作者：

Chen X. ^{[1
,3
]}

Zhu Y. ^{[2
]}

Xie N. ^{[1
,3
]}

Geng M. ^{[1
,3
]}

Lv C. ^{[3
]}

机构：

[1] Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou

[2] Institute of Intelligent Transportation Systems, Polytechnic Institute, Zhejiang University, Hangzhou

[3] College of Civil Engineering and Architecture, Zhejiang University, Hangzhou

来源：

Jiaotong Yunshu Xitong Gongcheng Yu Xinxi/Journal of Transportation Systems Engineering and Information Technology | 2024年 / 24卷 / 03期

基金：

中国国家自然科学基金;

关键词：

deep reinforcement learning; heterogeneous multi-agent; intelligent transportation; network-wide traffic signal control; spatio-temporal pressure reward;

D O I：

10.16097/j.cnki.1009-6744.2024.03.012

中图分类号：

学科分类号：

摘要：

Focusing on the complex traffic signal control task in an urban network, this study proposes a coordinated sequential optimization method based on a Heterogeneous Multi-Agent Transformer (HMATLight) to optimize network-wide traffic signals and improve the performance of signal control policy at intersections within the urban network. Specifically, considering the spatial correlation of multi-intersection traffic flow, a value encoder based on a self-attention mechanism is first designed to learn traffic observation representations and realize network-level communication. Secondly, in response to the non-stationary environment for multi-agent policy updates, a policy decoder based on the multi-agent advantage decomposition is constructed, which can sequentially output the optimal responsive action on the basis of the joint actions of preceding agents. Besides, an action-masking mechanism based on effective driving vehicles, adapting the decision frequency within the time-adequate interval, and a spatio-temporal pressure reward function considering the waiting fairness are constructed, which further enhance policy performance and practicality. A series of experiments are carried out on Hangzhou network datasets to validate the effectiveness of the proposed method. Experimental results show that the proposed HMATLight outperforms all baselines on two datasets with five metrics. Compared with the best-performed baseline, HMATLight decreases the average travel time by 10.89%, the average queue length by 18.84% and the average waiting time by 22.21%. Furthermore, HMATLight is dramatically higher in generalization and significantly reduces instances of long vehicle waiting times. © 2024 Science Press. All rights reserved.

引用

页码：114 / 126

页数：12

共 22 条

[11] ZHANG L, WU Q, SHEN J, Et al., Expression might be enough: Representing pressure and demand for reinforcement learning based traffic signal control, Proceedings of the 39th International Conference on Machine Learning, (2022)
[12] MAO F, LI Z H, LI L., A comparison of deep reinforcement learning models for isolated traffic signal control, IEEE Intelligent Transportation Systems Magazine, 15, 1, pp. 160-180, (2022)
[13] XU M, WU J P, HUANG L, Et al., Network-wide traffic signal control based on the discovery of critical nodes and deep reinforcement learning, Journal of Intelligent Transportation Systems, 24, 1, pp. 1-10, (2020)
[14] WEN M N, KUBA J G, LIN R J, Et al., Multi-agent reinforcement learning is a sequence modeling problem, Advances in Neural Information Processing Systems, 35, pp. 16509-16521, (2022)
[15] SCHULMAN J, WOLSKI F, DHARIWAL P, Et al., Proximal policy optimization algorithms, ArXiv Preprint ArXiv, 1707, (2017)
[16] KUBA J G, WEN M N, MENG L H, Et al., Settling the variance of multi-agent policy gradients, Advances in Neural Information Processing Systems, 34, pp. 13458-13470, (2021)
[17] SCHULMAN J, MORITZ P, LEVINE S, Et al., High-dimensional continuous control using generalized advantage estimation, ArXiv Preprint ArXiv, 1506, (2015)
[18] ZHANG H C, FENG S Y, LIU C, Et al., CityFlow: A multi-agent reinforcement learning environment for large scale city traffic scenario, Proceedings of the World Wide Web Conference (WWW 2019), (2019)
[19] KOONCE P, RODEGERDTS L, LEE K, Et al., Traffic signal timing manual, (2008)
[20] COOLS S B, GERSHENSON C, D'HOOGHE B., Self-organizing traffic lights: A realistic simulation, (2013)

← 1 2 3 →