A delay-robust method for enhanced real-time reinforcement learning

被引：0

作者：

机构：

[1] Xia, Bo

[2] Sun, Haoyuan

[3] Yuan, Bo

[4] Li, Zhiheng

[5] Liang, Bin

[6] Wang, Xueqian

来源：

Wang, Xueqian (wang.xq@sz.tsinghua.edu.cn) | 2025年 / 181卷

关键词：

Markov processes;

D O I：

10.1016/j.neunet.2024.106769

中图分类号：

学科分类号：

摘要：

In reinforcement learning, the Markov Decision Process (MDP) framework typically operates under a blocking paradigm, assuming a static environment during the agent's decision-making and stationary agent behavior while the environment executes its actions. This static model often proves inadequate for real-time tasks, as it lacks the flexibility to handle concurrent changes in both the agent's decision-making process and the environment's dynamic responses. Contemporary solutions, such as linear interpolation or state space augmentation, attempt to address the asynchronous nature of delayed states and actions in real-time environments. However, these methods frequently require precise delay measurements and may fail to fully capture the complexities of delay dynamics. However, these methods frequently require precise delay measurements and may fail to fully capture the complexities of delay dynamics. To address these challenges, we introduce a minimal information set that encapsulates concurrent information during agent-environment interactions, serving as the foundation of our real-time decision-making framework. The traditional blocking-mode MDP is then reformulated as a Minimal Information State Markov Decision Process (MISMDP), aligning more closely with the demands of real-time environments. Within this MISMDP framework, we propose the Minimal information set for Real-time tasks using Actor-Critic (MRAC), a general approach for addressing delay issues in real-time tasks, supported by a rigorous theoretical analysis of Q-function convergence. Extensive experiments across both discrete and continuous action space environments demonstrate that MRAC outperforms state-of-the-art algorithms, delivering superior performance and generalization in managing delays within real-time tasks. © 2024

引用

共 50 条

[41] Real-time Energy Management of Microgrid Using Reinforcement Learning
Bi, Wenzheng
Shu, Yuankai
Dong, Wei
Yang, Qiang
2020 19TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES 2020), 2020, : 38 - 41
[42] EXPERIMENTS WITH ONLINE REINFORCEMENT LEARNING IN REAL-TIME STRATEGY GAMES
Andersen, Kresten Toftgaard
Zeng, Yifeng
Christensen, Dennis Dahl
Tran, Dung
APPLIED ARTIFICIAL INTELLIGENCE, 2009, 23 (09) : 855 - 871
[43] Real-time Road Network Optimization with Coordinated Reinforcement Learning
Gunarathna, Udesh
Xie, Hairuo
Tanin, Egemen
Karunasekera, Shanika
Borovica-Gajic, Renata
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (04)
[44] Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots
Yuan, Yufeng
Mahmood, A. Rupam
2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 5546 - 5552
[45] A Real-time Anonymous Traffic Detection based on Reinforcement Learning
Liu, Dazhou
Park, Younghee
2024 IEEE 21ST CONSUMER COMMUNICATIONS & NETWORKING CONFERENCE, CCNC, 2024, : 574 - 577
[46] Real-time deep reinforcement learning based vehicle navigation
Koh, Songsang
Zhou, Bo
Fang, Hui
Yang, Po
Yang, Zaili
Yang, Qiang
Guan, Lin
Ji, Zhigang
APPLIED SOFT COMPUTING, 2020, 96
[47] Developing Real-Time Scheduling Policy by Deep Reinforcement Learning
Bo, Zitong
Qiao, Ying
Leng, Chang
Wang, Hongan
Guo, Chaoping
Zhang, Shaohui
2021 IEEE 27TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS 2021), 2021, : 131 - 142
[48] Deep Reinforcement Learning for Sponsored Search Real-time Bidding
Zhao, Jun
Qiu, Guang
Guan, Ziyu
Zhao, Wei
He, Xiaofei
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1021 - 1030
[49] Reinforcement Learning with Sequential Information Clustering in Real-Time Bidding
Lu, Junwei
Yang, Chaoqi
Gao, Xiaofeng
Wang, Liubin
Li, Changcheng
Chen, Guihai
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 1633 - 1641
[50] An enhanced method for the estimation of end-to-end cell delay variation for real-time services
Kataria, D
Logothetis, D
Elwalid, A
GLOBECOM'99: SEAMLESS INTERCONNECTION FOR UNIVERSAL SERVICES, VOL 1-5, 1999, : 1367 - 1372

← 1 2 3 4 5 →