Generalized Policy Iteration-based Reinforcement Learning Algorithm for Optimal Control of Unknown Discrete-time Systems

被引:0
|
作者
Lin, Mingduo [1 ]
Zhao, Bo [2 ]
Liu, Derong [1 ]
Liu, Xi [1 ]
Luo, Fangchao [1 ]
机构
[1] Guangdong Univ Technol, Sch Automat, Guangzhou 510006, Peoples R China
[2] Beijing Normal Univ, Sch Syst Sci, Beijing 100875, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Adaptive dynamic programming; Reinforcement learning; Generalized policy iteration; Neural networks; Optimal control; Unknown Discrete-time Systems; AFFINE NONLINEAR-SYSTEMS;
D O I
10.1109/CCDC52312.2021.9601467
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel generalized policy iteration-based reinforcement learning (RL) algorithm to deal with infinite-horizon optimal control problems of nonlinear discrete-time systems with completely unknown dynamics. In the present iterative algorithm, two iteration procedures are utilized to obtain the iterative Q-function and the iterative control policy. Furthermore, the iterative Q-function is obtained by the temporal difference learning and the policy gradient method is utilized to directly optimize the iterative control policy. Then, the convergence and optimality analysis of the generalized policy iteration-based RL algorithm are provided. To implement this algorithm, two neural networks, including a critic network and an action network, are used to approximate the iterative Q-function and the iterative control policy. Finally, a numerical simulation example is provided to illustrate the effectiveness of the proposed control method.
引用
收藏
页码:3650 / 3655
页数:6
相关论文
共 50 条
  • [21] Discrete-Time Generalized Policy Iteration ADP Algorithm With Approximation Errors
    Wei, Qinglai
    Li, Benkai
    Song, Ruizhuo
    [J]. 2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1636 - 1641
  • [22] A novel stable value iteration-based approximate dynamic programming algorithm for discrete-time nonlinear systems
    曲延华
    王安娜
    林盛
    [J]. Chinese Physics B, 2018, (01) : 232 - 239
  • [23] A novel stable value iteration-based approximate dynamic programming algorithm for discrete-time nonlinear systems
    Qu, Yan-Hua
    Wang, An-Na
    Lin, Sheng
    [J]. CHINESE PHYSICS B, 2018, 27 (01)
  • [24] Reinforcement Learning-Based Control for Nonlinear Discrete-Time Systems with Unknown Control Directions and Control Constraints
    Huang, Miao
    Liu, Cong
    He, Xiaoqi
    Ma, Longhua
    Lu, Zheming
    Su, Hongye
    [J]. NEUROCOMPUTING, 2020, 402 : 50 - 65
  • [25] Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems
    Zhu, Yuanheng
    Zhao, Dongbin
    He, Haibo
    Ji, Junhong
    [J]. COGNITIVE COMPUTATION, 2015, 7 (06) : 763 - 771
  • [26] Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems
    Yuanheng Zhu
    Dongbin Zhao
    Haibo He
    Junhong Ji
    [J]. Cognitive Computation, 2015, 7 : 763 - 771
  • [27] Linear quadratic tracking control of unknown discrete-time systems using value iteration algorithm
    Li, Xiaofeng
    Xue, Lei
    Sun, Changyin
    [J]. NEUROCOMPUTING, 2018, 314 : 86 - 93
  • [28] Output Feedback H∞ Control of Unknown Discrete-time Linear Systems: Off-policy Reinforcement Learning
    Tooranjipour, Pouria
    Kiumarsi, Bahare
    [J]. 2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 2264 - 2269
  • [29] Modified λ-Policy Iteration Based Adaptive Dynamic Programming for Unknown Discrete-Time Linear Systems
    Jiang, Huaiyuan
    Zhou, Bin
    Duan, Guang-Ren
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3291 - 3301
  • [30] Optimal Tracking Control for Linear Discrete-time Systems Using Reinforcement Learning
    Kiumarsi-Khomartash, Bahare
    Lewis, Frank L.
    Naghibi-Sistani, Mohammad-Bagher
    Karimpour, Ali
    [J]. 2013 IEEE 52ND ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2013, : 3845 - 3850