H∞ Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning

被引：4

作者：

Li, Jinna ^{[1
,2
]}

Xiao, Zhenfei ^{[1
]}

机构：

[1] Liaoning Shihua Univ, Sch Informat & Control Engn, Fushun 113001, Liaoning, Peoples R China

[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷 / 08期

基金：

中国国家自然科学基金;

关键词：

H-infinity control; off-policy Q-learning; game theory; Nash equilibrium; ZERO-SUM GAMES; STATIC OUTPUT-FEEDBACK; DIFFERENTIAL GRAPHICAL GAMES; OPTIMAL TRACKING CONTROL; ADAPTIVE OPTIMAL-CONTROL; POLE ASSIGNMENT; LINEAR-SYSTEMS; SYNCHRONIZATION; ALGORITHM; DESIGNS;

D O I：

10.1109/ACCESS.2020.2970760

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a novel off-policy game Q-learning algorithm to solve control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed in the proposed algorithm is implemented in an off-policy policy iteration approach other than on-policy learning, since the off-policy learning has some well-known advantages over the on-policy learning. All of players struggle together to minimize their common performance index meanwhile defeating the disturbance that tries to maximize the specific performance index, and finally they reach the Nash equilibrium of game resulting in satisfying disturbance attenuation condition. For finding the solution of the Nash equilibrium, control problem is first transformed into an optimal control problem. Then an off-policy Q-learning algorithm is put forward in the typical adaptive dynamic programming (ADP) and game architecture, such that control policies of all players can be learned using only measured data. More importantly, the rigorous proof of no bias of solution to the Nash equilibrium by using the proposed off-policy game Q-learning algorithm is presented. Comparative simulation results are provided to verify the effectiveness and demonstrate the advantages of the proposed method.

引用

下载

页码：28831 / 28846

页数：16

共 50 条

[21] Off-policy Reinforcement Learning for Robust Control of Discrete-time Uncertain Linear Systems
Yang, Yongliang
Guo, Zhishan
Wunsch, Donald
Yin, Yixin
PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 2507 - 2512
[22] Off-policy safe reinforcement learning for nonlinear discrete-time systems
Jha, Mayank Shekhar
Kiumarsi, Bahare
Neurocomputing, 2025, 611
[23] Minimax Q-learning design for H∞ control of linear discrete-time systems
Li, Xinxing
Xi, Lele
Zha, Wenzhong
Peng, Zhihong
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (03) : 438 - 451
[24] Zero-sum game-based optimal control for discrete-time Markov jump systems: A parallel off-policy Q-learning method
Wang, Yun
Fang, Tian
Kong, Qingkai
Li, Feng
APPLIED MATHEMATICS AND COMPUTATION, 2024, 467
[25] On-policy and off-policy Q-learning strategies for spacecraft systems: An approach for time-varying discrete-time without controllability assumption of augmented system
Nguyen, Hoang
Dang, Hoang Bach
Dao, Phuong Nam
AEROSPACE SCIENCE AND TECHNOLOGY, 2024, 146
[26] Data-Driven Robust Control of Discrete-Time Uncertain Linear Systems via Off-Policy Reinforcement Learning
Yang, Yongliang
Guo, Zhishan
Xiong, Haoyi
Ding, Da-Wei
Yin, Yixin
Wunsch, Donald C.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (12) : 3735 - 3747
[27] Optimized control for human-multi-robot collaborative manipulation via multi-player Q-learning
Liu, Xing
Huang, Panfeng
Ge, Shuzhi Sam
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2021, 358 (11): : 5639 - 5658
[28] Robust optimal tracking control for multiplayer systems by off-policy Q-learning approach
Li, Jinna
Xiao, Zhenfei
Li, Ping
Cao, Jiangtao
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2021, 31 (01) : 87 - 106
[29] Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
Kumar, Aviral
Fu, Justin
Tucker, George
Levine, Sergey
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[30] Iterative ADP learning algorithms for discrete-time multi-player games
Jiang, He
Zhang, Huaguang
ARTIFICIAL INTELLIGENCE REVIEW, 2018, 50 (01) : 75 - 91

← 1 2 3 4 5 →