Draw on advantages and avoid disadvantages by making a multi-step prediction

被引：1

作者：

Zhu, Guofeng ^{[1
]}

Zhu, Fei ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 237卷

基金：

中国国家自然科学基金;

关键词：

Reinforcement learning; Exploration; Intrinsic reward; Multi-step prediction; Policy optimization; EXPLORATION; CURIOSITY;

D O I：

10.1016/j.eswa.2023.121345

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning learns about the environment through the process of exploration, and the sufficient information collected during the interaction helps the agent to predict the future situation. However, uncontrolled exploration may cause the agent to stray into dangerous regions of the environment, leading to bad decisions and impairing the agent's performance. In order to address the issue, a framework, referred to as the policy guided by multi-step prediction (PGMP), is proposed. PGMP utilizes a curiosity mechanism based on multi-step prediction errors to stimulate exploration. To encourage the agent to explore safe or task-relevant areas, a safety bonus model is designed to determine whether the exploration area is safe or not by predicting the possible reward that can be gained. The combination of two intrinsic rewards serves as a curiosity model to give high returns to unknown states and possible safe actions. In addition, to avoid possible dangers in a limited number of future steps during exploration, a looking-ahead model is introduced to predict future multi-step states, actions, and rewards, respectively. Then, future information is combined with the policy network and included in the loss function of the policy update, allowing the agent to optimize its policy for predicted future states. Experiments on several tasks demonstrated that the proposed PGMP framework significantly improves the agent's performance.

引用

页数：15

共 50 条

[31] A novel multi-step prediction for wind speed based on EMD
Liu, Xingjie
Mi, Zengqiang
Yang, Qixun
Fan, Xiaowei
Diangong Jishu Xuebao/Transactions of China Electrotechnical Society, 2010, 25 (04): : 165 - 170
[32] Multi-step time series prediction intervals using neuroevolution
Paulo Cortez
Pedro José Pereira
Rui Mendes
Neural Computing and Applications, 2020, 32 : 8939 - 8953
[33] Error analysis of a multi-step prediction based blind equalizer
Mannerkoski, J
Koivunen, V
ISCAS '99: PROCEEDINGS OF THE 1999 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 3: ANALOG AND DIGITAL SIGNAL PROCESSING, 1999, : 86 - 89
[34] Multi-step Prediction of Worker Resource Usage at the Extreme Edge
Kain, Ruslan
Elsayed, Sara A.
Chen, Yuanzhu
Hassanein, Hossam S.
PROCEEDINGS OF THE 25TH ACM INTERNATIONAL CONFERENCE ON MODELING ANALYSIS AND SIMULATION OF WIRELESS AND MOBILE SYSTEMS, MSWIM 2022, 2022, : 25 - 32
[35] Multi-step Prediction for Learning Invariant Representations in Reinforcement Learning
Xu, Xinyue
Lv, Kai
Dong, Xingye
Han, Sheng
Lin, Youfang
2021 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE BIG DATA AND INTELLIGENT SYSTEMS (HPBD&IS), 2021, : 202 - 206
[36] Multi-Step Inflation Prediction with Functional Coefficient Autoregressive Model
Wang, Man
Chen, Kun
Luo, Qin
Cheng, Chao
SUSTAINABILITY, 2018, 10 (06):
[37] Improving Multi-Step Prediction of Learned Time Series Models
Venkatraman, Arun
Hebert, Martial
Bagnell, J. Andrew
PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3024 - 3030
[38] IoT traffic prediction using multi-step ahead prediction with neural network
Abdellah, Ali R.
Mahmood, Omar Abdul Kareem
Paramonov, Alexander
Koucheryavy, Andrey
2019 11TH INTERNATIONAL CONGRESS ON ULTRA MODERN TELECOMMUNICATIONS AND CONTROL SYSTEMS AND WORKSHOPS (ICUMT), 2019,
[39] Multi-Step Traffic Prediction for Multi-Period Planning in Optical Networks
Maryam, Hafsa
Panayiotou, Tania
Ellinas, Georgios
2024 24TH INTERNATIONAL CONFERENCE ON TRANSPARENT OPTICAL NETWORKS, ICTON 2024, 2024,
[40] Advantages and disadvantages of cognitive heuristics in political decision making
Lau, RR
Redlawsk, DP
AMERICAN JOURNAL OF POLITICAL SCIENCE, 2001, 45 (04) : 951 - 971

← 1 2 3 4 5 →