Exploring Deep Reinforcement Learning for Task Dispatching in Autonomous On-Demand Services

被引:4
|
作者
Yang, Lei [1 ]
Yu, Xi [1 ]
Cao, Jiannong [2 ]
Liu, Xuxun [3 ]
Zhou, Pan [4 ]
机构
[1] South China Univ Technol, Sch Software Engn, 382 Waihuandong Rd, Guangzhou 510006, Guangdong, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Kowloon, 11 Yucai Rd, Hong Kong, Peoples R China
[3] South China Univ Technol, Sch Elect & Informat Engn, 382 Waihuandong Rd, Guangzhou 510006, Guangdong, Peoples R China
[4] Huazhong Univ Sci & Technol, Hubei Engn Res Ctr Big Data Secur, Sch Cyber Sci & Engn, 1037 Luoyu Rd, Wuhan 430074, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Demand dispatching; on-demand services; deep reinforcement learning; ASSIGNMENT;
D O I
10.1145/3442343
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Autonomous on-demand services, such as GOGOX (formerly GoGoVan) in Hong Kong, provide a platform for users to request services and for suppliers to meet such demands. In such a platform, the suppliers have autonomy to accept or reject the demands to be dispatched to him/her, so it is challenging to make an online matching between demands and suppliers. Existing methods use round-based approaches to dispatch demands. In these works, the dispatching decision is based on the predicted response patterns of suppliers to demands in the current round, but they all fail to consider the impact of future demands and suppliers on the current dispatching decision. This could lead to taking a suboptimal dispatching decision from the future perspective. To solve this problem, we propose a novel demand dispatching model using deep reinforcement learning. In this model, we make each demand as an agent. The action of each agent, i.e., the dispatching decision of each demand, is determined by a centralized algorithm in a coordinated way. The model works in the following two steps. (1) It learns the demand's expected value in each spatiotemporal state using historical transition data. (2) Based on the learned values, it conducts a Many-To-Many dispatching using a combinatorial optimization algorithm by considering both immediate rewards and expected values of demands in the next round. In order to get a higher total reward, the demands with a high expected value (short response time) in the future may be delayed to the next round. On the contrary, the demands with a low expected value (long response time) in the future would be dispatched immediately. Through extensive experiments using real-world datasets, we show that the proposed model outperforms the existing models in terms of Cancellation Rate and Average Response Time.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] Controlling an Autonomous Vehicle with Deep Reinforcement Learning
    Folkers, Andreas
    Rick, Matthias
    Bueskens, Christof
    [J]. 2019 30TH IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV19), 2019, : 2025 - 2031
  • [42] Autonomous Drone Racing with Deep Reinforcement Learning
    Song, Yunlong
    Steinweg, Mats
    Kaufmann, Elia
    Scaramuzza, Davide
    [J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 1205 - 1212
  • [43] Deep Reinforcement Learning for Autonomous Driving: A Survey
    Kiran, B. Ravi
    Sobh, Ibrahim
    Talpaert, Victor
    Mannion, Patrick
    Al Sallab, Ahmad A.
    Yogamani, Senthil
    Perez, Patrick
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (06) : 4909 - 4926
  • [44] Deep Reinforcement Learning for Autonomous Search and Rescue
    Zuluaga, Juan Gonzalo Carcamo
    Leidig, Jonathan P.
    Trefftz, Christian
    Wolffe, Greg
    [J]. NAECON 2018 - IEEE NATIONAL AEROSPACE AND ELECTRONICS CONFERENCE, 2018, : 521 - 524
  • [45] Autonomous exploration through deep reinforcement learning
    Yan, Xiangda
    Huang, Jie
    He, Keyan
    Hong, Huajie
    Xu, Dasheng
    [J]. INDUSTRIAL ROBOT-THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH AND APPLICATION, 2023, 50 (05): : 793 - 803
  • [46] Autonomous drone interception with Deep Reinforcement Learning
    Bertoin, David
    Gauffriau, Adrien
    Grasset, Damien
    Gupta, Jayant Sen
    [J]. CEUR Workshop Proceedings, 2022, 3173
  • [47] Collaborative on-demand dynamic deployment via deep reinforcement learning for IoV service in multi edge clouds
    Huang, Yuze
    Feng, Beipeng
    Cao, Yuhui
    Guo, Zhenzhen
    Zhang, Miao
    Zheng, Boren
    [J]. JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2023, 12 (01):
  • [48] Reinforcement learning based scheme for on-demand vehicular fog formation
    Nsouli, Ahmad
    El-Hajj, Wassim
    Mourad, Azzam
    [J]. VEHICULAR COMMUNICATIONS, 2023, 40
  • [49] Collaborative on-demand dynamic deployment via deep reinforcement learning for IoV service in multi edge clouds
    Yuze Huang
    Beipeng Feng
    Yuhui Cao
    Zhenzhen Guo
    Miao Zhang
    Boren Zheng
    [J]. Journal of Cloud Computing, 12
  • [50] An intelligent open trading system for on-demand delivery facilitated by deep Q network based reinforcement learning
    Guo, Chaojie
    Zhang, Lele
    Thompson, Russell G.
    Foliente, Greg
    Peng, Xiaoshuai
    [J]. INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2024,