Draw on advantages and avoid disadvantages by making a multi-step prediction

被引:1
|
作者
Zhu, Guofeng [1 ]
Zhu, Fei [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Exploration; Intrinsic reward; Multi-step prediction; Policy optimization; EXPLORATION; CURIOSITY;
D O I
10.1016/j.eswa.2023.121345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning learns about the environment through the process of exploration, and the sufficient information collected during the interaction helps the agent to predict the future situation. However, uncontrolled exploration may cause the agent to stray into dangerous regions of the environment, leading to bad decisions and impairing the agent's performance. In order to address the issue, a framework, referred to as the policy guided by multi-step prediction (PGMP), is proposed. PGMP utilizes a curiosity mechanism based on multi-step prediction errors to stimulate exploration. To encourage the agent to explore safe or task-relevant areas, a safety bonus model is designed to determine whether the exploration area is safe or not by predicting the possible reward that can be gained. The combination of two intrinsic rewards serves as a curiosity model to give high returns to unknown states and possible safe actions. In addition, to avoid possible dangers in a limited number of future steps during exploration, a looking-ahead model is introduced to predict future multi-step states, actions, and rewards, respectively. Then, future information is combined with the policy network and included in the loss function of the policy update, allowing the agent to optimize its policy for predicted future states. Experiments on several tasks demonstrated that the proposed PGMP framework significantly improves the agent's performance.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Online multi-step prediction for wind speeds and solar irradiation: Evaluation of prediction errors
    Hirata, Yoshito
    Yamada, Taiji
    Takahashi, Jun
    Aihara, Kazuyuki
    Suzuki, Hideyuki
    RENEWABLE ENERGY, 2014, 67 : 35 - 39
  • [42] ON MULTI-STEP NON-LINEAR LEAST-SQUARES PREDICTION
    TONG, H
    MOEANADDIN, R
    STATISTICIAN, 1988, 37 (02): : 101 - 110
  • [43] Multi-Step Prediction of Occupancy Grid Maps with Recurrent Neural Networks
    Mohajerin, Nima
    Rohani, Mohsen
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10592 - 10600
  • [44] A Pattern Fusion Algorithm for Multi-Step Ahead Prediction of Surrogate Motion
    Zawisza, I.
    Yan, H.
    Yin, F.
    MEDICAL PHYSICS, 2014, 41 (06) : 98 - 99
  • [45] Some Convergence Properties of Multi-Step Prediction Error Identification Criteria
    Farina, Marcello
    Piroddi, Luigi
    47TH IEEE CONFERENCE ON DECISION AND CONTROL, 2008 (CDC 2008), 2008, : 756 - 761
  • [46] Incorporation of statistical methods in multi-step neural network prediction models
    Cloarec, GM
    Ringwood, J
    IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 1998, : 2513 - 2518
  • [47] Deep Learning for Multi-Step Performance Prediction in Operational Optical Networks
    Mezni, Ameni
    Charlton, Douglas W.
    Tremblay, Christine
    Desrosiers, Christian
    2020 CONFERENCE ON LASERS AND ELECTRO-OPTICS (CLEO), 2020,
  • [48] Temporal Feature Selection for Multi-Step Ahead Reheater Temperature Prediction
    Gui, Ning
    Lou, Jieli
    Qiu, Zhifeng
    Gui, Weihua
    PROCESSES, 2019, 7 (07) : 1 - 12
  • [49] Nonparametric multi-step prediction in nonlinear state space dynamic systems
    Vila, Jean-Pierre
    STATISTICS & PROBABILITY LETTERS, 2011, 81 (01) : 71 - 76
  • [50] An Improved Local Multi-Step Prediction Model for Chaotic Time Series
    Song, Shibao
    Yang, Shuying
    2017 3RD INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT (ICIM 2017), 2017, : 353 - 357