Non-zero-sum games of discrete-time Markov jump systems with unknown dynamics: An off-policy reinforcement learning method

被引:3
|
作者
Zhang, Xuewen [1 ]
Shen, Hao [1 ]
Li, Feng [1 ]
Wang, Jing [1 ]
机构
[1] Anhui Univ Technol, Sch Elect & Informat Engn, Maanshan 243002, Peoples R China
基金
中国国家自然科学基金;
关键词
coupled algebraic Riccati equations; Markov jump systems; non-zero-sum games; off-policy reinforcement learning; LINEAR-SYSTEMS; STABILITY;
D O I
10.1002/rnc.7021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article concentrates on the non-zero-sum games problem of discrete-time Markov jump systems without requiring the system dynamics information. First, the multiplayer non-zero-sum games problem can be converted to solve a set of coupled game algebraic Riccati equations, which is difficult to be solved directly. Then, to obtain the optimal control policies, a model-based algorithm adapting the policy iteration approach is proposed. However, the model-based algorithm relies on system dynamics information, which has the limitations in practice. Subsequently, an off-policy reinforcement learning algorithm is given to get rid of the dependence on system dynamics information, which only uses the information of system states and inputs. Moreover, the proof of convergence and Nash equilibrium are also given. Finally, a numerical example is given to demonstrate the effectiveness of the proposed algorithms.
引用
收藏
页码:949 / 968
页数:20
相关论文
共 50 条
  • [1] Optimal tracking control for non-zero-sum games of linear discrete-time systems via off-policy reinforcement learning
    Wen, Yinlei
    Zhang, Huaguang
    Su, Hanguang
    Ren, He
    [J]. OPTIMAL CONTROL APPLICATIONS & METHODS, 2020, 41 (04): : 1233 - 1250
  • [2] Off-policy reinforcement learning for tracking control of discrete-time Markov jump linear systems with completely unknown dynamics
    Huang, Zhen
    Tu, Yidong
    Fang, Haiyang
    Wang, Hai
    Zhang, Liang
    Shi, Kaibo
    He, Shuping
    [J]. Journal of the Franklin Institute, 2023, 360 (03) : 2361 - 2378
  • [3] Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics
    Song, Ruizhuo
    Wei, Qinglai
    Zhang, Huaguang
    Lewis, Frank L.
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (06) : 2929 - 2943
  • [4] Fuzzy-Based Adaptive Optimization of Unknown Discrete-Time Nonlinear Markov Jump Systems With Off-Policy Reinforcement Learning
    Fang, Haiyang
    Tu, Yidong
    Wang, Hai
    He, Shuping
    Liu, Fei
    Ding, Zhengtao
    Cheng, Shing Shin
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (12) : 5276 - 5290
  • [5] H∞ Tracking learning control for discrete-time Markov jump systems: A parallel off-policy reinforcement learning
    Zhang, Xuewen
    Xia, Jianwei
    Wang, Jing
    Chen, Xiangyong
    Shen, Hao
    [J]. JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2023, 360 (18): : 14878 - 14890
  • [6] Zero-sum game-based optimal control for discrete-time Markov jump systems: A parallel off-policy Q-learning method
    Wang, Yun
    Fang, Tian
    Kong, Qingkai
    Li, Feng
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2024, 467
  • [7] Off-Policy Reinforcement Learning for Optimal Preview Tracking Control of Linear Discrete-Time systems with unknown dynamics
    Wang, Chao-Ran
    Wu, Huai-Ning
    [J]. 2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 1402 - 1407
  • [8] Output Feedback H∞ Control of Unknown Discrete-time Linear Systems: Off-policy Reinforcement Learning
    Tooranjipour, Pouria
    Kiumarsi, Bahare
    [J]. 2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 2264 - 2269
  • [9] Off-Policy Reinforcement Learning for Partially Unknown Nonzero-Sum Games
    Zhang, Qichao
    Zhao, Dongbin
    Zhang, Sibo
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 822 - 830
  • [10] H∞ Optimal Control of Unknown Linear Discrete-time Systems: An Off-policy Reinforcement Learning Approach
    Kiumarsi, Bahare
    Modares, Hamidreza
    Lewis, Frank L.
    Jiang, Zhong-Ping
    [J]. PROCEEDINGS OF THE 2015 7TH IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS (CIS) AND ROBOTICS, AUTOMATION AND MECHATRONICS (RAM), 2015, : 41 - 46