SMAC-tuned Deep Q-learning for Ramp Metering

被引：0

作者：

ElSamadisy, Omar ^{[1
,3
]}

Abdulhai, Yazeed ^{[1
]}

Xue, Haoyuan ^{[2
]}

Smirnov, Ilia ^{[1
]}

Khalil, Elias B. ^{[2
]}

Abdulhai, Baher ^{[1
]}

机构：

[1] Univ Toronto, Dept Civil Engn, Toronto, ON, Canada

[2] Univ Toronto, Dept Mech & Ind Engn, Toronto, ON, Canada

[3] Arab Acad Sci Technol & Maritime Transport, Coll Engn & Technol, Dept Elect Commun Engn, Alexandria, Egypt

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON SMART MOBILITY, SM | 2023年

关键词：

Ramp metering; Reinforcement learning; Hyperparameter tuning;

D O I：

10.1109/SM57895.2023.10112246

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The demand for transportation increases as the population of a city grows, and significant expansion is not conceivable because of spatial, financial, and environmental limitations. As a result, improving infrastructure efficiency is becoming increasingly critical. Ramp metering with deep reinforcement learning (RL) is a method to tackle this problem. However, fine-tuning RL hyperparameters for RM is yet to be explored in the literature, potentially leaving performance improvements on the table. In this paper, the Sequential Model-based Algorithm Configuration (SMAC) method is used to finetune the values of two essential hyperparameters for the deep reinforcement learning ramp metering model, the discount factor and the decay of the explore/exploit ratio. Around 350 experiments with different configurations were run with PySMAC (a python interface to the hyperparameter optimization tool SMAC) and compared to Random search as a baseline. It is found that the best reward discount factor reflects that the RL agent should focus on immediate rewards and not pay much attention to future rewards. On the other hand, the selected value for the exploration ratio decay rate shows that the RL agent should prefer to decrease the exploration rate early. Both random search and SMAC show the same performance improvement of 19

引用

页码：65 / 72

页数：8

共 50 条

[1] Ramp Metering Control Based on the Q-Learning Algorithm
Ivanjko, Edouard
Necoska, Daniela Koltovska
Greguric, Martin
Vujic, Miroslav
Jurkovic, Goran
Mandzuka, Sadko
CYBERNETICS AND INFORMATION TECHNOLOGIES, 2015, 15 (05) : 88 - 97
[2] Motorway Ramp-Metering Control with Queuing Consideration using Q-Learning
Davarynejad, Mohsen
Hegyi, Andreas
Vrancken, Jos
van den Berg, Jan
2011 14TH INTERNATIONAL IEEE CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2011, : 1652 - 1658
[3] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Tan, Fuxiao
Yan, Pengfei
Guan, Xinping
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
[4] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
Ohnishi, Shota
Uchibe, Eiji
Yamaguchi, Yotaro
Nakanishi, Kosuke
Yasui, Yuji
Ishii, Shin
FRONTIERS IN NEUROROBOTICS, 2019, 13
[5] Discounted UCB1-tuned for Q-Learning
Saito, Koki
Notsu, Akira
Honda, Katsuhiro
2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2014, : 966 - 970
[6] Deep Reinforcement Learning with Double Q-Learning
van Hasselt, Hado
Guez, Arthur
Silver, David
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2094 - 2100
[7] Comparison of Deep Q-Learning, Q-Learning and SARSA Reinforced Learning for Robot Local Navigation
Anas, Hafiq
Ong, Wee Hong
Malik, Owais Ahmed
ROBOT INTELLIGENCE TECHNOLOGY AND APPLICATIONS 6, 2022, 429 : 443 - 454
[8] Active deep Q-learning with demonstration
Si-An Chen
Voot Tangkaratt
Hsuan-Tien Lin
Masashi Sugiyama
Machine Learning, 2020, 109 : 1699 - 1725
[9] Active deep Q-learning with demonstration
Chen, Si-An
Tangkaratt, Voot
Lin, Hsuan-Tien
Sugiyama, Masashi
MACHINE LEARNING, 2020, 109 (9-10) : 1699 - 1725
[10] Hierarchical clustering with deep Q-learning
Forster, Richard
Fulop, Agnes
ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA, 2018, 10 (01) : 86 - 109

← 1 2 3 4 5 →