Draw on advantages and avoid disadvantages by making a multi-step prediction

被引:1
|
作者
Zhu, Guofeng [1 ]
Zhu, Fei [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Exploration; Intrinsic reward; Multi-step prediction; Policy optimization; EXPLORATION; CURIOSITY;
D O I
10.1016/j.eswa.2023.121345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning learns about the environment through the process of exploration, and the sufficient information collected during the interaction helps the agent to predict the future situation. However, uncontrolled exploration may cause the agent to stray into dangerous regions of the environment, leading to bad decisions and impairing the agent's performance. In order to address the issue, a framework, referred to as the policy guided by multi-step prediction (PGMP), is proposed. PGMP utilizes a curiosity mechanism based on multi-step prediction errors to stimulate exploration. To encourage the agent to explore safe or task-relevant areas, a safety bonus model is designed to determine whether the exploration area is safe or not by predicting the possible reward that can be gained. The combination of two intrinsic rewards serves as a curiosity model to give high returns to unknown states and possible safe actions. In addition, to avoid possible dangers in a limited number of future steps during exploration, a looking-ahead model is introduced to predict future multi-step states, actions, and rewards, respectively. Then, future information is combined with the policy network and included in the loss function of the policy update, allowing the agent to optimize its policy for predicted future states. Experiments on several tasks demonstrated that the proposed PGMP framework significantly improves the agent's performance.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Advantages and Disadvantages in Making Gifts
    Boye, Erik C.
    TAX MAGAZINE, 1935, 13 (12): : 699 - +
  • [2] How stylisticians draw on narratology: Approaches, advantages and disadvantages
    Shen, Dan
    STYLE, 2005, 39 (04) : 381 - +
  • [3] Multi-step LSTM Prediction Model for Visibility Prediction
    Meng, Yunlong
    Qi, Fengliang
    Zuo, Heng
    Chen, Bo
    Yuan, Xian
    Xiao, Yao
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [4] Multi-step Prediction Algorithm for State Prediction Model
    Zhang, Zili
    Song, Hongwei
    SMART MATERIALS AND INTELLIGENT SYSTEMS, PTS 1 AND 2, 2011, 143-144 : 634 - 638
  • [5] NEW IMPROVEMENTS IN THE MULTI-STEP INVERSE FEM FOR THE FAST FORMABILITY PREDICTION OF MULTI-STEP AUTOBODY STAMING PROCESS
    Tang, Bingtao
    Lu, Xiaoyang
    ENGINEERING PLASTICITY AND ITS APPLICATIONS, 2010, : 151 - 155
  • [6] A multi-step decision prediction model based on LightGBM
    Luo, Yuhao
    Xu, Qianfang
    Li, Wenliang
    Jiang, Feng
    Xiao, Bo
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5714 - 5718
  • [7] Multi-step Ahead Prediction Using Neural Networks
    Pilka, Filip
    Oravec, Milos
    53RD INTERNATIONAL SYMPOSIUM ELMAR-2011, 2011, : 269 - 272
  • [8] Transfer Learning for Multi-Step Resource Utilization Prediction
    Parera, Claudia
    Liao, Qi
    Malanchini, Ilaria
    Wellington, Dan
    Redondi, Alessandro E. C.
    Cesana, Matteo
    2020 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS (IEEE PIMRC), 2020,
  • [9] Multi-step ahead prediction based on the principle of concatenation
    Kaynak, M.O.
    Proceedings of the Institution of Mechanical Engineers. Part I, Journal of systems and control engineering, 1993, 207 (01) : 57 - 61
  • [10] Multi-step Prediction of Physiological Tremor for Robotics Applications
    Veluvolu, K. C.
    Tatinati, S.
    Hong, S. M.
    Ang, W. T.
    2013 35TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2013, : 5075 - 5078