A novel Q-learning algorithm based on improved whale optimization algorithm for path planning

被引：6

作者：

Li, Ying ^{[1
,2
]}

Wang, Hanyu ^{[1
,2
]}

Fan, Jiahao ^{[3
]}

Geng, Yanyu ^{[1
,2
]}

机构：

[1] Jilin Univ, Coll Comp Sci & Technol, Changchun, Peoples R China

[2] Jilin Univ, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun, Peoples R China

[3] Sichuan Univ, Coll Comp Sci, Chengdu, Peoples R China

来源：

PLOS ONE | 2022年 / 17卷 / 12期

关键词：

MOTH-FLAME OPTIMIZATION; MOBILE ROBOT; MEGAPTERA-NOVAEANGLIAE; HUMPBACK WHALES; NAVIGATION; SONGS; POWER;

D O I：

10.1371/journal.pone.0279438

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Q-learning is a classical reinforcement learning algorithm and one of the most important methods of mobile robot path planning without a prior environmental model. Nevertheless, Q-learning is too simple when initializing Q-table and wastes too much time in the exploration process, causing a slow convergence speed. This paper proposes a new Q-learning algorithm called the Paired Whale Optimization Q-learning Algorithm (PWOQLA) which includes four improvements. Firstly, to accelerate the convergence speed of Q-learning, a whale optimization algorithm is used to initialize the values of a Q-table. Before the exploration process, a Q-table which contains previous experience is learned to improve algorithm efficiency. Secondly, to improve the local exploitation capability of the whale optimization algorithm, a paired whale optimization algorithm is proposed in combination with a pairing strategy to speed up the search for prey. Thirdly, to improve the exploration efficiency of Q-learning and reduce the number of useless explorations, a new selective exploration strategy is introduced which considers the relationship between current position and target position. Fourthly, in order to balance the exploration and exploitation capabilities of Q-learning so that it focuses on exploration in the early stage and on exploitation in the later stage, a nonlinear function is designed which changes the value of epsilon in epsilon-greedy Q-learning dynamically based on the number of iterations. Comparing the performance of PWOQLA with other path planning algorithms, experimental results demonstrate that PWOQLA achieves a higher level of accuracy and a faster convergence speed than existing counterparts in mobile robot path planning. The code will be released at https://github.com/wanghanyu0526/improveQL.git.

引用

页数：30

共 50 条

[1] A Path Planning Algorithm for UAV Based on Improved Q-Learning
Yan, Chao
Xiang, Xiaojia
[J]. 2018 2ND INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION SCIENCES (ICRAS), 2018, : 46 - 50
[2] Coverage Path Planning Optimization Based on Q-Learning Algorithm
Piardi, Luis
Lima, Jose
Pereira, Ana, I
Costa, Paulo
[J]. INTERNATIONAL CONFERENCE ON NUMERICAL ANALYSIS AND APPLIED MATHEMATICS (ICNAAM-2018), 2019, 2116
[3] PATH PLANNING OF MOBILE ROBOT BASED ON THE IMPROVED Q-LEARNING ALGORITHM
Chen, Chaorui
Wang, Dongshu
[J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2022, 18 (03): : 687 - 702
[4] Indoor Emergency Path Planning Based on the Q-Learning Optimization Algorithm
Xu, Shenghua
Gu, Yang
Li, Xiaoyan
Chen, Cai
Hu, Yingyi
Sang, Yu
Jiang, Wenxing
[J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2022, 11 (01)
[5] ETQ-learning: an improved Q-learning algorithm for path planning
Wang, Huanwei
Jing, Jing
Wang, Qianlv
He, Hongqi
Qi, Xuyan
Lou, Rui
[J]. INTELLIGENT SERVICE ROBOTICS, 2024, 17 (04) : 915 - 929
[6] Path planning for unmanned surface vehicle based on improved Q-Learning algorithm
Wang, Yuanhui
Lu, Changzhou
Wu, Peng
Zhang, Xiaoyue
[J]. OCEAN ENGINEERING, 2024, 292
[7] Path planning of UAVs based on improved whale optimization algorithm
Wu K.
Tan S.
[J]. Wu, Kun (wukun@buaa.edu.cn), 1600, Chinese Society of Astronautics (41):
[8] UAV Path Planning based on Improved Whale Optimization Algorithm
Liu, Kun
Xv, Cheng
Huang, Daqing
Ye, Xinning
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS AND COMPUTER ENGINEERING (ICCECE), 2021, : 569 - 573
[9] A novel Q-Learning Algorithm Based on the Stochastic Environment Path Planning Problem
Jian, Li
Rong, Fei
Yu, Tang
[J]. 2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 1977 - 1982
[10] Dynamic Path Planning of a Mobile Robot with Improved Q-Learning algorithm
Li, Siding
Xu, Xin
Zuo, Lei
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 409 - 414

← 1 2 3 4 5 →