Proximal policy optimization learning based control of congested freeway traffic

被引：0

作者：

Mo, Shurong ^{[1
]}

Wu, Nailong ^{[1
,2
,6
]}

Qi, Jie ^{[1
,2
]}

Pan, Anqi ^{[1
]}

Feng, Zhiguang ^{[3
]}

Yan, Huaicheng ^{[4
]}

Wang, Yueying ^{[5
]}

机构：

[1] Donghua Univ, Coll Informat Sci & Technol, Shanghai, Peoples R China

[2] Donghua Univ, Minist Educ, Engn Res Ctr Digitized Text & Apparel Technol, Shanghai, Peoples R China

[3] Harbin Engn Univ, Coll Automat, Harbin, Peoples R China

[4] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai, Peoples R China

[5] Shanghai Univ, Sch Mechatron Engn & Automation, Shanghai, Peoples R China

[6] Donghua Univ, Coll Informat Sci & Technol, Shanghai 201620, Peoples R China

来源：

OPTIMAL CONTROL APPLICATIONS & METHODS | 2024年 / 45卷 / 02期

基金：

中国国家自然科学基金;

关键词：

adaptive control; adaptive cruise control; input delay; proximal policy optimization; traffic flow; FEEDBACK-CONTROL; FLOW; INSTABILITY; SYSTEMS; MODELS; WAVES;

D O I：

10.1002/oca.3068

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, a delay compensation feedback controller based on reinforcement learning is proposed to adjust the time interval of the adaptive cruise control (ACC) vehicle agents in the traffic congestion by introducing the proximal policy optimization (PPO) scheme. The high-speed traffic flow is characterized by a two-by-two Aw Rasle Zhang nonlinear first-order partial differential equations (PDEs). Unlike the backstepping delay compensation control,23 the PPO controller proposed in this paper consists of the current traffic flow velocity, the current traffic flow density and the previous one step control input. Since the system dynamics of the traffic flow are difficult to be expressed mathematically, the control gains of the three feedback can be determined via learning from the interaction between the PPO and the digital simulator of the traffic system. The performance of Lyapunov control, backstepping control and PPO control are compared with numerical simulation. The results demonstrate that PPO control is superior to Lyapunov control in terms of the convergence rate and control efforts for the traffic system without delay. As for the traffic system with unstable input delay value, the performance of PPO controller is also equivalent to that of backstepping controller. Besides, PPO is more robust than backstepping controller when the parameter is sensitive to Gaussian noise. A delay compensation feedback controller based on reinforcement learning is proposed, utilizing theProximal Policy Optimization (PPO) algorithm to adaptively adjust the gains ofthe cruise controller, thereby regulating vehicle time intervals in traffic congestion.The traffic congestion flow is described using the Aw Rasle Zhang model. The numericalsimulations are conducted to compare the performance of Lyapunov control,backstepping control, and PPO control. The results demonstrate the superiorconvergence and robustness of the proposed method.image

引用

页码：719 / 736

页数：18

共 50 条

[41] Developing control strategies for freeway merging points under congested traffic situations using modeling and a simulation approach
Sarvi, M., 1600, Institute for Transportation (37):
[42] A Machine Learning Method for Dynamic Traffic Control and Guidance on Freeway Networks
Wen, Kaige
Qu, Shiru
Zhang, Yumei
2009 INTERNATIONAL ASIA CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION, AND ROBOTICS, PROCEEDINGS, 2009, : 67 - 71
[43] Vibration control of three coupled flexible beams using reinforcement learning algorithm based on proximal policy optimization
Qiu, Zhi-cheng
Du, Jia-hao
Zhang, Xian-min
JOURNAL OF INTELLIGENT MATERIAL SYSTEMS AND STRUCTURES, 2022, 33 (20) : 2578 - 2603
[44] Proximal Policy Optimization-Based Driving Control Strategy of Connected Cruise Vehicle Platoons to Improve Traffic Efficiency and Safety
Xu, Zhanrui
Jiao, Xiaohong
Ru, Shuangkun
TRANSPORTATION RESEARCH RECORD, 2023, 2677 (06) : 58 - 72
[45] Action Masking-Based Proximal Policy Optimization With the Dual-Ring Phase Structure for Adaptive Traffic Signal Control
Fan, Shuying
Lu, Kai
Wang, Yinhai
Tian, Xin
Zhang, Minxue
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (02) : 2422 - 2433
[46] Signal Control in a Congested Traffic Area
Krylatov, Alexander Y.
Zakharov, Victor V.
Malygin, Igor G.
2015 INTERNATIONAL CONFERENCE "STABILITY AND CONTROL PROCESSES" IN MEMORY OF V.I. ZUBOV (SCP), 2015, : 475 - 478
[47] CIPPO: Contrastive Imitation Proximal Policy Optimization for Recommendation Based on Reinforcement Learning
Chen, Weilong
Zhang, Shaoliang
Xie, Ruobing
Xia, Feng
Lin, Leyu
Zhang, Xinran
Wang, Yan
Zhang, Yanru
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (11) : 5753 - 5767
[48] Learning to walk with biped robot based on an improved proximal policy optimization algorithm
Zhang, Chao
Zhong, Peisi
Liang, Zhongyuan
Liu, Mei
Wang, Xiao
Liu, Jinming
INTERNATIONAL CONFERENCE ON INTELLIGENT EQUIPMENT AND SPECIAL ROBOTS (ICIESR 2021), 2021, 12127
[49] Efficiency and Equity Based Freeway Traffic Network Flow Control
Wen, Kaige
Yang, Wugang
Qu, Shiru
2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 5, 2010, : 453 - 456
[50] A control scheme for freeway traffic systems based on hybrid automata
Simona Sacone
Silvia Siri
Discrete Event Dynamic Systems, 2012, 22 : 3 - 25

← 1 2 3 4 5 →