Draw on advantages and avoid disadvantages by making a multi-step prediction

被引:1
|
作者
Zhu, Guofeng [1 ]
Zhu, Fei [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Exploration; Intrinsic reward; Multi-step prediction; Policy optimization; EXPLORATION; CURIOSITY;
D O I
10.1016/j.eswa.2023.121345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning learns about the environment through the process of exploration, and the sufficient information collected during the interaction helps the agent to predict the future situation. However, uncontrolled exploration may cause the agent to stray into dangerous regions of the environment, leading to bad decisions and impairing the agent's performance. In order to address the issue, a framework, referred to as the policy guided by multi-step prediction (PGMP), is proposed. PGMP utilizes a curiosity mechanism based on multi-step prediction errors to stimulate exploration. To encourage the agent to explore safe or task-relevant areas, a safety bonus model is designed to determine whether the exploration area is safe or not by predicting the possible reward that can be gained. The combination of two intrinsic rewards serves as a curiosity model to give high returns to unknown states and possible safe actions. In addition, to avoid possible dangers in a limited number of future steps during exploration, a looking-ahead model is introduced to predict future multi-step states, actions, and rewards, respectively. Then, future information is combined with the policy network and included in the loss function of the policy update, allowing the agent to optimize its policy for predicted future states. Experiments on several tasks demonstrated that the proposed PGMP framework significantly improves the agent's performance.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Fuzzy multi-step ahead prediction of VBR video sources
    Qiu, B
    Zhang, LR
    Wu, HR
    ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 1623 - 1626
  • [22] Iterative multi-step prediction model based on theory of evidence
    Hong, Bei
    Hu, Chang-Hua
    Jiang, Xue-Peng
    Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2010, 27 (12): : 1737 - 1742
  • [23] Notes on multi-step ahead prediction based on the principle of concatenation
    Rossiter, J.A.
    Proceedings of the Institution of Mechanical Engineers. Part I, Journal of systems and control engineering, 1993, 207 (04) : 261 - 263
  • [24] Multi-step forward intelligent prediction of tool wear condition
    Zhu, Kunpeng
    Huang, Chengyi
    Li, Jun
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2024, 30 (09): : 3038 - 3049
  • [25] Learning multi-step prediction models for receding horizon control
    Terzi, Enrico
    Fagiano, Lorenzo
    Farina, Marcello
    Scattolini, Riccardo
    2018 EUROPEAN CONTROL CONFERENCE (ECC), 2018, : 1335 - 1340
  • [26] ENHANCING MULTI-STEP ACTION PREDICTION FOR ACTIVE OBJECT DETECTION
    Fang, Fen
    Xu, Qianli
    Gauthier, Nicolas
    Li, Liyuan
    Lim, Joo-Hwee
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2189 - 2193
  • [27] A Multi-step Ahead Dyadic Particle Filter for Price Prediction
    Ntemi, Myrsini
    Kotropoulos, Constantine
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1516 - 1520
  • [28] Multi-step time series prediction intervals using neuroevolution
    Cortez, Paulo
    Pereira, Pedro Jose
    Mendes, Rui
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (13): : 8939 - 8953
  • [29] Multi-step prediction of time series with random missing data
    Wu, Xuedong
    Wang, Yaonan
    Mao, Jianxu
    Du, Zhaoping
    Li, Chunhua
    APPLIED MATHEMATICAL MODELLING, 2014, 38 (14) : 3512 - 3522
  • [30] Multi-step prediction error approach for controller performance monitoring
    Zhao, Yu
    Chu, Jian
    Su, Hongye
    Huang, Biao
    CONTROL ENGINEERING PRACTICE, 2010, 18 (01) : 1 - 12