A batch reinforcement learning approach to vacant taxi routing

被引:7
|
作者
Yu, Xinlian [1 ]
Gao, Song [2 ]
机构
[1] Southeast Univ, Sch Transportat, Nanjing, Peoples R China
[2] Univ Massachusetts, Dept Civil & Environm Engn, Amherst, MA USA
关键词
Vacant taxi routing; Markov decision process; Batch reinforcement learning; Fitted Q-iteration; MARKOV DECISION-PROCESS; GO; FRAMEWORK; NETWORKS; FLEET; MODEL; GAME;
D O I
10.1016/j.trc.2022.103640
中图分类号
U [交通运输];
学科分类号
08 ; 0823 ;
摘要
The optimal routing of a single vacant taxi is formulated as a Markov Decision Process (MDP) problem to account for profit maximization over a full working period in a transportation network. A batch offline reinforcement learning (RL) method is proposed to learn action values and the optimal policy from archived trajectory data. The method is model-free, in that no state transition model is needed. It is more efficient than the commonly used online RL methods based on interactions with a simulator, due to batch processing and reuse of transition experiences.The batch RL method is evaluated in a large network of Shanghai, China with GPS trajectories of over 12,000 taxis. The training is conducted with two datasets: one is a synthetic dataset where state transitions are generated in a simulator with a postulated system dynamics model (Yu et al., 2019) whose parameters are derived from observed data; the other contains real-world state transitions extracted from observed taxi trajectories.The batch RL method is more computationally efficient, reducing the training time by dozens of times compared with the online Q-learning method. Its performance in terms of average profit per hour and occupancy rate is assessed in the simulator, against that of a baseline model, the random walk, and an upper bound, generated by the exact Dynamic Programming (DP) method based on the same system model of the simulator. The batch RL based on simulated and observed trajectories both outperform the random walk, and the advantage increases with the training sample size. The batch RL based on simulated trajectories achieves 95% of the performance upper bound with 30-minutes time intervals, suggesting that the model-free method is highly effective. The batch RL based on observed data achieves around 90% of the performance upper bound with 30-minute time intervals, due to the discrepancy between the training and evaluation environments, and its performance in the real world is expected to similarly good since the training and evaluation would be based on the same environment.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Reinforcement learning for adaptive routing
    Peshkin, L
    Savova, V
    PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 1825 - 1830
  • [22] Reinforcement Learning for Batch-to-Batch Bioprocess Optimisation
    Petsagkourakis, P.
    Sandoval, I. Orson
    Bradford, E.
    Zhang, D.
    del Rio-Chanona, E. A.
    29TH EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING, PT A, 2019, 46 : 919 - 924
  • [23] MetisRL: A Reinforcement Learning Approach for Dynamic Routing in Data Center Networks
    Gao, Yuanning
    Gao, Xiaofeng
    Chen, Guihai
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT II, 2022, : 615 - 622
  • [24] Routing of Electric Vehicles With Intermediary Charging Stations: A Reinforcement Learning Approach
    Dorokhova, Marina
    Ballif, Christophe
    Wyrsch, Nicolas
    FRONTIERS IN BIG DATA, 2021, 4
  • [25] Optimize taxi driving strategies based on reinforcement learning
    Gao, Yong
    Jiang, Dan
    Xu, Yan
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2018, 32 (08) : 1677 - 1696
  • [26] Reinforcement Learning Approach to Stochastic Vehicle Routing Problem With Correlated Demands
    Iklassov, Zangir
    Sobirov, Ikboljon
    Solozabal, Ruben
    Takac, Martin
    IEEE ACCESS, 2023, 11 : 87958 - 87969
  • [27] A Reinforcement Learning Approach for Routing in Marine Communication Network of Fishing Vessels
    Simi Surendran
    Alberto Montresor
    Maneesha Vinodini Ramesh
    SN Computer Science, 6 (1)
  • [28] A Reinforcement Learning Approach to Network Routing based on Adaptive Learning Rates and Route Memory
    Kavalerov, Maksim
    Likhacheva, Yuliya
    Shilova, Yuliya
    SOUTHEASTCON 2017, 2017,
  • [29] Learning to Routing in UAV Swarm Network: A Multi-Agent Reinforcement Learning Approach
    Wang, Zunliang
    Yao, Haipeng
    Mai, Tianle
    Xiong, Zehui
    Wu, Xiaohua
    Wu, Di
    Guo, Song
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (05) : 6611 - 6624
  • [30] Using reinforcement learning to minimize taxi idle times
    O'Keeffe, Kevin
    Anklesaria, Sam
    Santi, Paolo
    Ratti, Carlo
    JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 26 (04) : 498 - 509