Efficient Jamming Policy Generation Method Based on Multi-Timescale Ensemble Q-Learning

被引：0

作者：

Qian, Jialong ^{[1
]}

Zhou, Qingsong ^{[1
]}

Li, Zhihui ^{[1
]}

Yang, Zhongping ^{[1
]}

Shi, Shasha ^{[1
]}

Xu, Zhenjia ^{[1
]}

Xu, Qiyun ^{[2
]}

机构：

[1] Natl Univ Def Technol, Coll Elect Engn, Hefei 230037, Peoples R China

[2] PLA, Unit 93216, Beijing 100085, Peoples R China

来源：

REMOTE SENSING | 2024年 / 16卷 / 17期

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

jamming policy generation; multifunctional radar; Q-learning; multi-timescale ensemble; RADAR;

D O I：

10.3390/rs16173158

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

With the advancement of radar technology toward multifunctionality and cognitive capabilities, traditional radar countermeasures are no longer sufficient to meet the demands of countering the advanced multifunctional radar (MFR) systems. Rapid and accurate generation of the optimal jamming strategy is one of the key technologies for efficiently completing radar countermeasures. To enhance the efficiency and accuracy of jamming policy generation, an efficient jamming policy generation method based on multi-timescale ensemble Q-learning (MTEQL) is proposed in this paper. First, the task of generating jamming strategies is framed as a Markov decision process (MDP) by constructing a countermeasure scenario between the jammer and radar, while analyzing the principle radar operation mode transitions. Then, multiple structure-dependent Markov environments are created based on the real-world adversarial interactions between jammers and radars. Q-learning algorithms are executed concurrently in these environments, and their results are merged through an adaptive weighting mechanism that utilizes the Jensen-Shannon divergence (JSD). Ultimately, a low-complexity and near-optimal jamming policy is derived. Simulation results indicate that the proposed method has superior jamming policy generation performance compared with the Q-learning algorithm, in terms of the short jamming decision-making time and low average strategy error rate.

引用

页数：21

共 50 条

[1] Multi-Timescale Ensemble Q-Learning for Markov Decision Process Policy Optimization
Bozkus, Talha
Mitra, Urbashi
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 1427 - 1442
[2] An Enhanced Ensemble Learning Method for Sentiment Analysis based on Q-learning
Savargiv, Mohammad
Masoumi, Behrooz
Keyvanpour, Mohammad Reza
IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY-TRANSACTIONS OF ELECTRICAL ENGINEERING, 2024, 48 (03) : 1261 - 1277
[3] Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning
Emami, Patrick
Zhang, Xiangyu
Biagioni, David
Zamzam, Ahmed S.
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 2372 - 2378
[4] A Q-learning-based Multi-timescale Resilience Enhancement Approach for Power Grids with High Renewables
Huang, Yanting
Zhong, Qing
Wang, Akang
Lin, Shunjiang
Peng, Chaoyi
Lei, Shunbo
2024 IEEE 2ND INTERNATIONAL CONFERENCE ON POWER SCIENCE AND TECHNOLOGY, ICPST 2024, 2024, : 1919 - 1924
[5] Design of cognitive radar jamming based on Q-learning algorithm
Li, Yun-Jie
Zhu, Yun-Peng
Gao, Mei-Guo
Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2015, 35 (11): : 1194 - 1199
[6] Q-learning intelligent jamming decision algorithm based on efficient upper confidence bound variance
Rao N.
Xu H.
Song B.
Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2022, 54 (05): : 162 - 170
[7] A Multi-Parameter Intelligent Communication Anti-Jamming Method Based on Three-Dimensional Q-Learning
Pu, Ziming
Niu, Yingtao
Zhang, Guoliang
2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 205 - 210
[8] Optimal method for the generation of the attack path based on the Q-learning decision
Li T.
Cao S.
Yin S.
Wei D.
Ma X.
Ma J.
Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2021, 48 (01): : 160 - 167
[9] Q-Learning with probability based action policy
Ugurlu, Ekin Su
Biricik, Goksel
2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2, 2006, : 210 - +
[10] Cooperative Q-Learning Based on Maturity of the Policy
Yang, Mao
Tian, Yantao
Liu, Xiaomei
2009 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, VOLS 1-7, CONFERENCE PROCEEDINGS, 2009, : 1352 - 1356

← 1 2 3 4 5 →