A delay-robust method for enhanced real-time reinforcement learning

被引:0
|
作者
机构
[1] Xia, Bo
[2] Sun, Haoyuan
[3] Yuan, Bo
[4] Li, Zhiheng
[5] Liang, Bin
[6] Wang, Xueqian
关键词
Markov processes;
D O I
10.1016/j.neunet.2024.106769
中图分类号
学科分类号
摘要
In reinforcement learning, the Markov Decision Process (MDP) framework typically operates under a blocking paradigm, assuming a static environment during the agent's decision-making and stationary agent behavior while the environment executes its actions. This static model often proves inadequate for real-time tasks, as it lacks the flexibility to handle concurrent changes in both the agent's decision-making process and the environment's dynamic responses. Contemporary solutions, such as linear interpolation or state space augmentation, attempt to address the asynchronous nature of delayed states and actions in real-time environments. However, these methods frequently require precise delay measurements and may fail to fully capture the complexities of delay dynamics. However, these methods frequently require precise delay measurements and may fail to fully capture the complexities of delay dynamics. To address these challenges, we introduce a minimal information set that encapsulates concurrent information during agent-environment interactions, serving as the foundation of our real-time decision-making framework. The traditional blocking-mode MDP is then reformulated as a Minimal Information State Markov Decision Process (MISMDP), aligning more closely with the demands of real-time environments. Within this MISMDP framework, we propose the Minimal information set for Real-time tasks using Actor-Critic (MRAC), a general approach for addressing delay issues in real-time tasks, supported by a rigorous theoretical analysis of Q-function convergence. Extensive experiments across both discrete and continuous action space environments demonstrate that MRAC outperforms state-of-the-art algorithms, delivering superior performance and generalization in managing delays within real-time tasks. © 2024
引用
收藏
相关论文
共 50 条
  • [41] Real-time Energy Management of Microgrid Using Reinforcement Learning
    Bi, Wenzheng
    Shu, Yuankai
    Dong, Wei
    Yang, Qiang
    2020 19TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES 2020), 2020, : 38 - 41
  • [42] EXPERIMENTS WITH ONLINE REINFORCEMENT LEARNING IN REAL-TIME STRATEGY GAMES
    Andersen, Kresten Toftgaard
    Zeng, Yifeng
    Christensen, Dennis Dahl
    Tran, Dung
    APPLIED ARTIFICIAL INTELLIGENCE, 2009, 23 (09) : 855 - 871
  • [43] Real-time Road Network Optimization with Coordinated Reinforcement Learning
    Gunarathna, Udesh
    Xie, Hairuo
    Tanin, Egemen
    Karunasekera, Shanika
    Borovica-Gajic, Renata
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (04)
  • [44] Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots
    Yuan, Yufeng
    Mahmood, A. Rupam
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 5546 - 5552
  • [45] A Real-time Anonymous Traffic Detection based on Reinforcement Learning
    Liu, Dazhou
    Park, Younghee
    2024 IEEE 21ST CONSUMER COMMUNICATIONS & NETWORKING CONFERENCE, CCNC, 2024, : 574 - 577
  • [46] Real-time deep reinforcement learning based vehicle navigation
    Koh, Songsang
    Zhou, Bo
    Fang, Hui
    Yang, Po
    Yang, Zaili
    Yang, Qiang
    Guan, Lin
    Ji, Zhigang
    APPLIED SOFT COMPUTING, 2020, 96
  • [47] Developing Real-Time Scheduling Policy by Deep Reinforcement Learning
    Bo, Zitong
    Qiao, Ying
    Leng, Chang
    Wang, Hongan
    Guo, Chaoping
    Zhang, Shaohui
    2021 IEEE 27TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS 2021), 2021, : 131 - 142
  • [48] Deep Reinforcement Learning for Sponsored Search Real-time Bidding
    Zhao, Jun
    Qiu, Guang
    Guan, Ziyu
    Zhao, Wei
    He, Xiaofei
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1021 - 1030
  • [49] Reinforcement Learning with Sequential Information Clustering in Real-Time Bidding
    Lu, Junwei
    Yang, Chaoqi
    Gao, Xiaofeng
    Wang, Liubin
    Li, Changcheng
    Chen, Guihai
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 1633 - 1641
  • [50] An enhanced method for the estimation of end-to-end cell delay variation for real-time services
    Kataria, D
    Logothetis, D
    Elwalid, A
    GLOBECOM'99: SEAMLESS INTERCONNECTION FOR UNIVERSAL SERVICES, VOL 1-5, 1999, : 1367 - 1372