Model-Free Optimal Tracking Control of Nonlinear Input-Affine Discrete-Time Systems via an Iterative Deterministic Q-Learning Algorithm

被引:30
|
作者
Song, Shijie [1 ]
Zhu, Minglei [1 ]
Dai, Xiaolin [1 ]
Gong, Dawei [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Mech & Elect Engn, Chengdu 611731, Peoples R China
基金
芬兰科学院;
关键词
Heuristic algorithms; Q-learning; Nonlinear dynamical systems; Approximation algorithms; Iterative algorithms; Convergence; Artificial neural networks; Adaptive dynamic programming (ADP); neural network (NN); off-policy technique; optimal tracking control (OTC); CONTROL SCHEME; LINEAR-SYSTEMS;
D O I
10.1109/TNNLS.2022.3178746
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, a novel model-free dynamic inversion-based Q-learning (DIQL) algorithm is proposed to solve the optimal tracking control (OTC) problem of unknown nonlinear input-affine discrete-time (DT) systems. Compared with the existing DIQL algorithm and the discount factor-based Q-learning (DFQL) algorithm, the proposed algorithm can eliminate the tracking error while ensuring that it is model-free and off-policy. First, a new deterministic Q-learning iterative scheme is presented, and based on this scheme, a model-based off-policy DIQL algorithm is designed. The advantage of this new scheme is that it can avoid the training of unusual data and improve data utilization, thereby saving computing resources. Simultaneously, the convergence and stability of the designed algorithm are analyzed, and the proof that adding probing noise into the behavior policy does not affect the convergence is presented. Then, by introducing neural networks (NNs), the model-free version of the designed algorithm is further proposed so that the OTC problem can be solved without any knowledge about the system dynamics. Finally, three simulation examples are given to demonstrate the effectiveness of the proposed algorithm.
引用
收藏
页码:999 / 1012
页数:14
相关论文
共 50 条
  • [1] Optimal tracking control for discrete-time systems by model-free off-policy Q-learning approach
    Li, Jinna
    Yuan, Decheng
    Ding, Zhengtao
    2017 11TH ASIAN CONTROL CONFERENCE (ASCC), 2017, : 7 - 12
  • [2] Model-Free Q-Learning for the Tracking Problem of Linear Discrete-Time Systems
    Li, Chun
    Ding, Jinliang
    Lewis, Frank L.
    Chai, Tianyou
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3191 - 3201
  • [3] H∞ Tracking Control for Linear Discrete-Time Systems: Model-Free Q-Learning Designs
    Yang, Yunjie
    Wan, Yan
    Zhu, Jihong
    Lewis, Frank L.
    IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (01): : 175 - 180
  • [4] Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm
    Wang, Tao
    Zhang, Huaguang
    Luo, Yanhong
    NEUROCOMPUTING, 2018, 312 : 1 - 8
  • [5] Model-free optimal tracking control for discrete-time system with delays using reinforcement Q-learning
    Liu, Yang
    Yu, Rui
    ELECTRONICS LETTERS, 2018, 54 (12) : 750 - 751
  • [6] Adjustable Iterative Q-Learning Schemes for Model-Free Optimal Tracking Control
    Qiao, Junfei
    Zhao, Mingming
    Wang, Ding
    Ha, Mingming
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (02): : 1202 - 1213
  • [7] Off-Policy Interleaved Q-Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems
    Li, Jinna
    Chai, Tianyou
    Lewis, Frank L.
    Ding, Zhengtao
    Jiang, Yi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (05) : 1308 - 1320
  • [8] Policy Optimization Adaptive Dynamic Programming for Optimal Control of Input-Affine Discrete-Time Nonlinear Systems
    Lin, Mingduo
    Zhao, Bo
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (07): : 4339 - 4350
  • [9] Model-Free Learning Control of Nonlinear Discrete-Time Systems
    Sadegh, Nader
    2011 AMERICAN CONTROL CONFERENCE, 2011, : 3553 - 3558
  • [10] Model-free H∞ control design for unknown linear discrete-time systems via Q-learning with LMI
    Kim, J. -H.
    Lewis, F. L.
    AUTOMATICA, 2010, 46 (08) : 1320 - 1326