Generalized Policy Iteration-based Reinforcement Learning Algorithm for Optimal Control of Unknown Discrete-time Systems

被引：0

作者：

Lin, Mingduo ^{[1
]}

Zhao, Bo ^{[2
]}

Liu, Derong ^{[1
]}

Liu, Xi ^{[1
]}

Luo, Fangchao ^{[1
]}

机构：

[1] Guangdong Univ Technol, Sch Automat, Guangzhou 510006, Peoples R China

[2] Beijing Normal Univ, Sch Syst Sci, Beijing 100875, Peoples R China

来源：

PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021) | 2021年

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

Adaptive dynamic programming; Reinforcement learning; Generalized policy iteration; Neural networks; Optimal control; Unknown Discrete-time Systems; AFFINE NONLINEAR-SYSTEMS;

D O I：

10.1109/CCDC52312.2021.9601467

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a novel generalized policy iteration-based reinforcement learning (RL) algorithm to deal with infinite-horizon optimal control problems of nonlinear discrete-time systems with completely unknown dynamics. In the present iterative algorithm, two iteration procedures are utilized to obtain the iterative Q-function and the iterative control policy. Furthermore, the iterative Q-function is obtained by the temporal difference learning and the policy gradient method is utilized to directly optimize the iterative control policy. Then, the convergence and optimality analysis of the generalized policy iteration-based RL algorithm are provided. To implement this algorithm, two neural networks, including a critic network and an action network, are used to approximate the iterative Q-function and the iterative control policy. Finally, a numerical simulation example is provided to illustrate the effectiveness of the proposed control method.

引用

页码：3650 / 3655

页数：6

共 50 条

[1] Learning Optimal Control Policy for Unknown Discrete-Time Systems
Lai, Jing
Xiong, Junlin
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (11) : 4191 - 4195
[2] Optimal Learning Control for Discrete-Time Nonlinear Systems Using Generalized Policy Iteration Based Adaptive Dynamic Programming
Wei, Qinglai
Liu, Derong
[J]. 2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 1781 - 1786
[3] H∞ Optimal Control of Unknown Linear Discrete-time Systems: An Off-policy Reinforcement Learning Approach
Kiumarsi, Bahare
Modares, Hamidreza
Lewis, Frank L.
Jiang, Zhong-Ping
[J]. PROCEEDINGS OF THE 2015 7TH IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS (CIS) AND ROBOTICS, AUTOMATION AND MECHATRONICS (RAM), 2015, : 41 - 46
[4] A Generalized Policy Iteration Adaptive Dynamic Programming Algorithm for Optimal Control of Discrete-Time Nonlinear Systems with Actuator Saturation
Lin, Qiao
Wei, Qinglai
Zhao, Bo
[J]. ADVANCES IN NEURAL NETWORKS, PT II, 2017, 10262 : 60 - 65
[5] Policy Iteration-based Indirect Adaptive Optimal Control for Completely Unknown Continuous-Time LTI Systems
Jha, Sumit Kumar
Roy, Sayan Basu
Bhasin, Shubhendu
[J]. 2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 448 - 454
[6] Off-Policy Reinforcement Learning for Optimal Preview Tracking Control of Linear Discrete-Time systems with unknown dynamics
Wang, Chao-Ran
Wu, Huai-Ning
[J]. 2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 1402 - 1407
[7] Policy Iteration for Optimal Control of Discrete-Time Time-Varying Nonlinear Systems
Zhu, Guangyu
Li, Xiaolu
Sun, Ranran
Yang, Yiyuan
Zhang, Peng
[J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2023, 10 (03) : 781 - 791
[8] Policy Iteration for Optimal Control of Discrete-Time Time-Varying Nonlinear Systems
Guangyu Zhu
Xiaolu Li
Ranran Sun
Yiyuan Yang
Peng Zhang
[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10 (03) : 781 - 791
[9] Discrete-Time Nonlinear Generalized Policy Iteration for Optimal Control Using Neural Networks
Wei, Qinglai
Liu, Derong
Yang, Xiong
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2014), PT I, 2014, 8834 : 389 - 396
[10] Data-based stable value iteration optimal control for unknown discrete-time systems with time delays
Ren, He
Zhang, Huaguang
Su, Hanguang
Mu, Yunfei
[J]. NEUROCOMPUTING, 2020, 382 : 96 - 105

← 1 2 3 4 5 →