Assessment of Reward Functions for Reinforcement Learning Traffic Signal Control under Real-World Limitations

被引：0

作者：

Egea, Alvaro Cabrejas ^{[1
,2
]}

Howell, Shaun ^{[2
]}

Knutins, Maksis ^{[2
]}

Connaughton, Colm ^{[3
]}

机构：

[1] Univ Warwick, MathSys Ctr Doctoral Training, Coventry CV4 7AL, W Midlands, England

[2] Vivac Labs, London NW5 3AQ, England

[3] Univ Warwick, Warwick Math Inst, Coventry CV4 7AL, W Midlands, England

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC) | 2020年

基金：

“创新英国”项目;

关键词：

Reinforcement Learning; Urban Traffic Control; Smart Cities; Agent-Based Modeling;

D O I：

10.1109/smc42975.2020.9283498

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Adaptive traffic signal control is one key avenue for mitigating the growing consequences of traffic congestion. Incumbent solutions such as SCOOT and SCATS require regular and time-consuming calibration, can't optimise well for multiple road use modalities, and require the manual curation of many implementation plans. A recent alternative to these approaches are deep reinforcement learning algorithms, in which an agent learns how to take the most appropriate action for a given state of the system. This is guided by neural networks approximating a reward function that provides feedback to the agent regarding the performance of the actions taken, making it sensitive to the specific reward function chosen. Several authors have surveyed the reward functions used in the literature, but attributing outcome differences to reward function choice across works is problematic as there are many uncontrolled differences, as well as different outcome metrics. This paper compares the performance of agents using different reward functions in a simulation of a junction in Greater Manchester, UK, across various demand profiles, subject to real world constraints: realistic sensor inputs, controllers, calibrated demand, intergreen times and stage sequencing. The reward metrics considered are based on the time spent stopped, lost time, change in lost time, average speed, queue length, junction throughput and variations of these magnitudes. The performance of these reward functions is compared in terms of total waiting time. We find that speed maximisation resulted in the lowest average waiting times across all demand levels, displaying significantly better performance than other rewards previously introduced in the literature.

引用

页码：965 / 972

页数：8

共 50 条

[31] Real-World Reinforcement Learning via Multifidelity Simulators
Cutler, Mark
Walsh, Thomas J.
How, Jonathan P.
[J]. IEEE TRANSACTIONS ON ROBOTICS, 2015, 31 (03) : 655 - 671
[32] Reward functions for learning to control in air traffic flow management
Cruciol, Leonardo L. B. V.
de Arruda, Antonio C., Jr.
Li Weigang
Li, Leihong
Crespo, Antonio M. F.
[J]. TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2013, 35 : 141 - 155
[33] Reinforcement Learning for Semi-Active Vertical Dynamics Control with Real-World Tests
Ultsch, Johannes
Pfeiffer, Andreas
Ruggaber, Julian
Kamp, Tobias
Brembeck, Jonathan
Tobolar, Jakub
[J]. APPLIED SCIENCES-BASEL, 2024, 14 (16):
[34] Towards Distributed Communication and Control in Real-World Multi-Agent Reinforcement Learning
Liu, Jieyan
Liu, Yi
Du, Zhekai
Lu, Ke
[J]. IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 4974 - 4979
[35] Traffic Smoothing Controllers for Autonomous Vehicles Using Deep Reinforcement Learning and Real-World Trajectory Data
Lichtle, Nathan
Jang, Kathy
Shah, Adit
Vinitsky, Eugene
Lee, Jonathan W.
Bayen, Alexandre M.
[J]. 2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 4346 - 4351
[36] Offline Learning of Counterfactual Predictions for Real-World Robotic Reinforcement Learning
Jin, Jun
Graves, Daniel
Haigh, Cameron
Luo, Jun
Jagersand, Martin
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 3616 - 3623
[37] Adaptive traffic signal control for real-world scenarios in agent-based transport simulations
Thunig, Theresa
Kuhnel, Nico
Nagel, Kai
[J]. 21ST EURO WORKING GROUP ON TRANSPORTATION MEETING (EWGT 2018), 2019, 37 : 481 - 488
[38] Leveraging Real-World Data in Safety Signal Assessment
Patadia, Vaishali
Manlik, Katrin
Gipson, Geoffrey
Willis, Jenna C.
Namuyinga, Ruth
Mcdermott, Rachel
Shaw, Anita
Miller, Mary K.
Asubonteng, Julius
Golchin, Negar
von Klot, Stephanie
[J]. THERAPEUTIC INNOVATION & REGULATORY SCIENCE, 2024,
[39] Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events
Aslani, Mohammad
Mesgari, Mohammad Saadi
Wiering, Marco
[J]. TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2017, 85 : 732 - 752
[40] NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning
Qin, Rong-Jun
Zhang, Xingyuan
Gao, Songyi
Chen, Xiong-Hui
Li, Zewen
Zhang, Weinan
Yu, Yang
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,

← 1 2 3 4 5 →