H∞ Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning

被引:4
|
作者
Li, Jinna [1 ,2 ]
Xiao, Zhenfei [1 ]
机构
[1] Liaoning Shihua Univ, Sch Informat & Control Engn, Fushun 113001, Liaoning, Peoples R China
[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷 / 08期
基金
中国国家自然科学基金;
关键词
H-infinity control; off-policy Q-learning; game theory; Nash equilibrium; ZERO-SUM GAMES; STATIC OUTPUT-FEEDBACK; DIFFERENTIAL GRAPHICAL GAMES; OPTIMAL TRACKING CONTROL; ADAPTIVE OPTIMAL-CONTROL; POLE ASSIGNMENT; LINEAR-SYSTEMS; SYNCHRONIZATION; ALGORITHM; DESIGNS;
D O I
10.1109/ACCESS.2020.2970760
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel off-policy game Q-learning algorithm to solve control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed in the proposed algorithm is implemented in an off-policy policy iteration approach other than on-policy learning, since the off-policy learning has some well-known advantages over the on-policy learning. All of players struggle together to minimize their common performance index meanwhile defeating the disturbance that tries to maximize the specific performance index, and finally they reach the Nash equilibrium of game resulting in satisfying disturbance attenuation condition. For finding the solution of the Nash equilibrium, control problem is first transformed into an optimal control problem. Then an off-policy Q-learning algorithm is put forward in the typical adaptive dynamic programming (ADP) and game architecture, such that control policies of all players can be learned using only measured data. More importantly, the rigorous proof of no bias of solution to the Nash equilibrium by using the proposed off-policy game Q-learning algorithm is presented. Comparative simulation results are provided to verify the effectiveness and demonstrate the advantages of the proposed method.
引用
下载
收藏
页码:28831 / 28846
页数:16
相关论文
共 50 条
  • [21] Off-policy Reinforcement Learning for Robust Control of Discrete-time Uncertain Linear Systems
    Yang, Yongliang
    Guo, Zhishan
    Wunsch, Donald
    Yin, Yixin
    PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 2507 - 2512
  • [22] Off-policy safe reinforcement learning for nonlinear discrete-time systems
    Jha, Mayank Shekhar
    Kiumarsi, Bahare
    Neurocomputing, 2025, 611
  • [23] Minimax Q-learning design for H∞ control of linear discrete-time systems
    Li, Xinxing
    Xi, Lele
    Zha, Wenzhong
    Peng, Zhihong
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (03) : 438 - 451
  • [24] Zero-sum game-based optimal control for discrete-time Markov jump systems: A parallel off-policy Q-learning method
    Wang, Yun
    Fang, Tian
    Kong, Qingkai
    Li, Feng
    APPLIED MATHEMATICS AND COMPUTATION, 2024, 467
  • [25] On-policy and off-policy Q-learning strategies for spacecraft systems: An approach for time-varying discrete-time without controllability assumption of augmented system
    Nguyen, Hoang
    Dang, Hoang Bach
    Dao, Phuong Nam
    AEROSPACE SCIENCE AND TECHNOLOGY, 2024, 146
  • [26] Data-Driven Robust Control of Discrete-Time Uncertain Linear Systems via Off-Policy Reinforcement Learning
    Yang, Yongliang
    Guo, Zhishan
    Xiong, Haoyi
    Ding, Da-Wei
    Yin, Yixin
    Wunsch, Donald C.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (12) : 3735 - 3747
  • [27] Optimized control for human-multi-robot collaborative manipulation via multi-player Q-learning
    Liu, Xing
    Huang, Panfeng
    Ge, Shuzhi Sam
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2021, 358 (11): : 5639 - 5658
  • [28] Robust optimal tracking control for multiplayer systems by off-policy Q-learning approach
    Li, Jinna
    Xiao, Zhenfei
    Li, Ping
    Cao, Jiangtao
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2021, 31 (01) : 87 - 106
  • [29] Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
    Kumar, Aviral
    Fu, Justin
    Tucker, George
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [30] Iterative ADP learning algorithms for discrete-time multi-player games
    Jiang, He
    Zhang, Huaguang
    ARTIFICIAL INTELLIGENCE REVIEW, 2018, 50 (01) : 75 - 91