A batch reinforcement learning approach to vacant taxi routing

被引:7
|
作者
Yu, Xinlian [1 ]
Gao, Song [2 ]
机构
[1] Southeast Univ, Sch Transportat, Nanjing, Peoples R China
[2] Univ Massachusetts, Dept Civil & Environm Engn, Amherst, MA USA
关键词
Vacant taxi routing; Markov decision process; Batch reinforcement learning; Fitted Q-iteration; MARKOV DECISION-PROCESS; GO; FRAMEWORK; NETWORKS; FLEET; MODEL; GAME;
D O I
10.1016/j.trc.2022.103640
中图分类号
U [交通运输];
学科分类号
08 ; 0823 ;
摘要
The optimal routing of a single vacant taxi is formulated as a Markov Decision Process (MDP) problem to account for profit maximization over a full working period in a transportation network. A batch offline reinforcement learning (RL) method is proposed to learn action values and the optimal policy from archived trajectory data. The method is model-free, in that no state transition model is needed. It is more efficient than the commonly used online RL methods based on interactions with a simulator, due to batch processing and reuse of transition experiences.The batch RL method is evaluated in a large network of Shanghai, China with GPS trajectories of over 12,000 taxis. The training is conducted with two datasets: one is a synthetic dataset where state transitions are generated in a simulator with a postulated system dynamics model (Yu et al., 2019) whose parameters are derived from observed data; the other contains real-world state transitions extracted from observed taxi trajectories.The batch RL method is more computationally efficient, reducing the training time by dozens of times compared with the online Q-learning method. Its performance in terms of average profit per hour and occupancy rate is assessed in the simulator, against that of a baseline model, the random walk, and an upper bound, generated by the exact Dynamic Programming (DP) method based on the same system model of the simulator. The batch RL based on simulated and observed trajectories both outperform the random walk, and the advantage increases with the training sample size. The batch RL based on simulated trajectories achieves 95% of the performance upper bound with 30-minutes time intervals, suggesting that the model-free method is highly effective. The batch RL based on observed data achieves around 90% of the performance upper bound with 30-minute time intervals, due to the discrepancy between the training and evaluation environments, and its performance in the real world is expected to similarly good since the training and evaluation would be based on the same environment.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Routing an Autonomous Taxi with Reinforcement Learning
    Han, Miyoung
    Senellart, Pierre
    Bressan, Stephane
    Wu, Huayu
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 2421 - 2424
  • [2] A Markov decision process approach to vacant taxi routing with e-hailing
    Yu, Xinlian
    Gao, Song
    Hu, Xianbiao
    Park, Hyoshin
    TRANSPORTATION RESEARCH PART B-METHODOLOGICAL, 2019, 121 : 114 - 134
  • [3] A Deep Reinforcement Learning Approach for Global Routing
    Liao, Haiguang
    Zhang, Wentai
    Dong, Xuliang
    Poczos, Barnabas
    Shimada, Kenji
    Kara, Levent Burak
    JOURNAL OF MECHANICAL DESIGN, 2020, 142 (06)
  • [4] A two-stage approach to modeling vacant taxi movements
    Wong, R. C. P.
    Szeto, W. Y.
    Wong, S. C.
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2015, 59 : 147 - 163
  • [5] A Two-Stage Approach to Modeling Vacant Taxi Movements
    Wong, R. C. P.
    Szeto, W. Y.
    Wong, S. C.
    21ST INTERNATIONAL SYMPOSIUM ON TRANSPORTATION AND TRAFFIC THEORY, 2015, 7 : 254 - 275
  • [6] Dispatch of autonomous vehicles for taxi services: A deep reinforcement learning approach
    Mao, Chao
    Liu, Yulin
    Shen, Zuo-Jun
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2020, 115
  • [7] An Integrated Reinforcement Learning and Centralized Programming Approach for Online Taxi Dispatching
    Liang, Enming
    Wen, Kexin
    Lam, William H. K.
    Sumalee, Agachai
    Zhong, Renxin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (09) : 4742 - 4756
  • [8] A reinforcement learning approach to optimising batch process control
    Wilson, JA
    Martinez, EC
    ADVANCES IN PROCESS CONTROL 5, 1998, : 53 - 60
  • [9] A Reinforcement Learning Approach for Interdomain Routing with Link Prices
    Vrancx, Peter
    Gurzi, Pasquale
    Rodriguez, Abdel
    Steenhaut, Kris
    Nowe, Ann
    ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2015, 10 (01)
  • [10] Good or Mediocre? A Deep Reinforcement Learning Approach for Taxi Revenue Efficiency Optimization
    Wang, Haotian
    Rong, Huigui
    Zhang, Qun
    Liu, Daibo
    Hu, Chunhua
    Hu, Yupeng
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2020, 7 (04): : 3018 - 3027