Generalized Policy Iteration-based Reinforcement Learning Algorithm for Optimal Control of Unknown Discrete-time Systems

被引:0
|
作者
Lin, Mingduo [1 ]
Zhao, Bo [2 ]
Liu, Derong [1 ]
Liu, Xi [1 ]
Luo, Fangchao [1 ]
机构
[1] Guangdong Univ Technol, Sch Automat, Guangzhou 510006, Peoples R China
[2] Beijing Normal Univ, Sch Syst Sci, Beijing 100875, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Adaptive dynamic programming; Reinforcement learning; Generalized policy iteration; Neural networks; Optimal control; Unknown Discrete-time Systems; AFFINE NONLINEAR-SYSTEMS;
D O I
10.1109/CCDC52312.2021.9601467
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel generalized policy iteration-based reinforcement learning (RL) algorithm to deal with infinite-horizon optimal control problems of nonlinear discrete-time systems with completely unknown dynamics. In the present iterative algorithm, two iteration procedures are utilized to obtain the iterative Q-function and the iterative control policy. Furthermore, the iterative Q-function is obtained by the temporal difference learning and the policy gradient method is utilized to directly optimize the iterative control policy. Then, the convergence and optimality analysis of the generalized policy iteration-based RL algorithm are provided. To implement this algorithm, two neural networks, including a critic network and an action network, are used to approximate the iterative Q-function and the iterative control policy. Finally, a numerical simulation example is provided to illustrate the effectiveness of the proposed control method.
引用
收藏
页码:3650 / 3655
页数:6
相关论文
共 50 条
  • [1] Learning Optimal Control Policy for Unknown Discrete-Time Systems
    Lai, Jing
    Xiong, Junlin
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (11) : 4191 - 4195
  • [2] Optimal Learning Control for Discrete-Time Nonlinear Systems Using Generalized Policy Iteration Based Adaptive Dynamic Programming
    Wei, Qinglai
    Liu, Derong
    [J]. 2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 1781 - 1786
  • [3] H∞ Optimal Control of Unknown Linear Discrete-time Systems: An Off-policy Reinforcement Learning Approach
    Kiumarsi, Bahare
    Modares, Hamidreza
    Lewis, Frank L.
    Jiang, Zhong-Ping
    [J]. PROCEEDINGS OF THE 2015 7TH IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS (CIS) AND ROBOTICS, AUTOMATION AND MECHATRONICS (RAM), 2015, : 41 - 46
  • [4] A Generalized Policy Iteration Adaptive Dynamic Programming Algorithm for Optimal Control of Discrete-Time Nonlinear Systems with Actuator Saturation
    Lin, Qiao
    Wei, Qinglai
    Zhao, Bo
    [J]. ADVANCES IN NEURAL NETWORKS, PT II, 2017, 10262 : 60 - 65
  • [5] Policy Iteration-based Indirect Adaptive Optimal Control for Completely Unknown Continuous-Time LTI Systems
    Jha, Sumit Kumar
    Roy, Sayan Basu
    Bhasin, Shubhendu
    [J]. 2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 448 - 454
  • [6] Off-Policy Reinforcement Learning for Optimal Preview Tracking Control of Linear Discrete-Time systems with unknown dynamics
    Wang, Chao-Ran
    Wu, Huai-Ning
    [J]. 2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 1402 - 1407
  • [7] Policy Iteration for Optimal Control of Discrete-Time Time-Varying Nonlinear Systems
    Zhu, Guangyu
    Li, Xiaolu
    Sun, Ranran
    Yang, Yiyuan
    Zhang, Peng
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2023, 10 (03) : 781 - 791
  • [8] Policy Iteration for Optimal Control of Discrete-Time Time-Varying Nonlinear Systems
    Guangyu Zhu
    Xiaolu Li
    Ranran Sun
    Yiyuan Yang
    Peng Zhang
    [J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10 (03) : 781 - 791
  • [9] Discrete-Time Nonlinear Generalized Policy Iteration for Optimal Control Using Neural Networks
    Wei, Qinglai
    Liu, Derong
    Yang, Xiong
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2014), PT I, 2014, 8834 : 389 - 396
  • [10] Data-based stable value iteration optimal control for unknown discrete-time systems with time delays
    Ren, He
    Zhang, Huaguang
    Su, Hanguang
    Mu, Yunfei
    [J]. NEUROCOMPUTING, 2020, 382 : 96 - 105