Draw on advantages and avoid disadvantages by making a multi-step prediction

被引:1
|
作者
Zhu, Guofeng [1 ]
Zhu, Fei [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Exploration; Intrinsic reward; Multi-step prediction; Policy optimization; EXPLORATION; CURIOSITY;
D O I
10.1016/j.eswa.2023.121345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning learns about the environment through the process of exploration, and the sufficient information collected during the interaction helps the agent to predict the future situation. However, uncontrolled exploration may cause the agent to stray into dangerous regions of the environment, leading to bad decisions and impairing the agent's performance. In order to address the issue, a framework, referred to as the policy guided by multi-step prediction (PGMP), is proposed. PGMP utilizes a curiosity mechanism based on multi-step prediction errors to stimulate exploration. To encourage the agent to explore safe or task-relevant areas, a safety bonus model is designed to determine whether the exploration area is safe or not by predicting the possible reward that can be gained. The combination of two intrinsic rewards serves as a curiosity model to give high returns to unknown states and possible safe actions. In addition, to avoid possible dangers in a limited number of future steps during exploration, a looking-ahead model is introduced to predict future multi-step states, actions, and rewards, respectively. Then, future information is combined with the policy network and included in the loss function of the policy update, allowing the agent to optimize its policy for predicted future states. Experiments on several tasks demonstrated that the proposed PGMP framework significantly improves the agent's performance.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] A novel multi-step prediction for wind speed based on EMD
    Liu, Xingjie
    Mi, Zengqiang
    Yang, Qixun
    Fan, Xiaowei
    Diangong Jishu Xuebao/Transactions of China Electrotechnical Society, 2010, 25 (04): : 165 - 170
  • [32] Multi-step time series prediction intervals using neuroevolution
    Paulo Cortez
    Pedro José Pereira
    Rui Mendes
    Neural Computing and Applications, 2020, 32 : 8939 - 8953
  • [33] Error analysis of a multi-step prediction based blind equalizer
    Mannerkoski, J
    Koivunen, V
    ISCAS '99: PROCEEDINGS OF THE 1999 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 3: ANALOG AND DIGITAL SIGNAL PROCESSING, 1999, : 86 - 89
  • [34] Multi-step Prediction of Worker Resource Usage at the Extreme Edge
    Kain, Ruslan
    Elsayed, Sara A.
    Chen, Yuanzhu
    Hassanein, Hossam S.
    PROCEEDINGS OF THE 25TH ACM INTERNATIONAL CONFERENCE ON MODELING ANALYSIS AND SIMULATION OF WIRELESS AND MOBILE SYSTEMS, MSWIM 2022, 2022, : 25 - 32
  • [35] Multi-step Prediction for Learning Invariant Representations in Reinforcement Learning
    Xu, Xinyue
    Lv, Kai
    Dong, Xingye
    Han, Sheng
    Lin, Youfang
    2021 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE BIG DATA AND INTELLIGENT SYSTEMS (HPBD&IS), 2021, : 202 - 206
  • [36] Multi-Step Inflation Prediction with Functional Coefficient Autoregressive Model
    Wang, Man
    Chen, Kun
    Luo, Qin
    Cheng, Chao
    SUSTAINABILITY, 2018, 10 (06):
  • [37] Improving Multi-Step Prediction of Learned Time Series Models
    Venkatraman, Arun
    Hebert, Martial
    Bagnell, J. Andrew
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3024 - 3030
  • [38] IoT traffic prediction using multi-step ahead prediction with neural network
    Abdellah, Ali R.
    Mahmood, Omar Abdul Kareem
    Paramonov, Alexander
    Koucheryavy, Andrey
    2019 11TH INTERNATIONAL CONGRESS ON ULTRA MODERN TELECOMMUNICATIONS AND CONTROL SYSTEMS AND WORKSHOPS (ICUMT), 2019,
  • [39] Multi-Step Traffic Prediction for Multi-Period Planning in Optical Networks
    Maryam, Hafsa
    Panayiotou, Tania
    Ellinas, Georgios
    2024 24TH INTERNATIONAL CONFERENCE ON TRANSPARENT OPTICAL NETWORKS, ICTON 2024, 2024,
  • [40] Advantages and disadvantages of cognitive heuristics in political decision making
    Lau, RR
    Redlawsk, DP
    AMERICAN JOURNAL OF POLITICAL SCIENCE, 2001, 45 (04) : 951 - 971