Multiple Actor-Critic Structures for Continuous-Time Optimal Control Using Input-Output Data

被引：124

作者：

Song, Ruizhuo ^{[1
]}

Lewis, Frank ^{[2
,3
]}

Wei, Qinglai ^{[4
]}

Zhang, Hua-Guang ^{[5
]}

Jiang, Zhong-Ping ^{[6
]}

Levine, Dan ^{[7
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China

[2] Univ Texas Arlington, UTA Res Inst, Ft Worth, TX USA

[3] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110004, Peoples R China

[4] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China

[5] Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110004, Peoples R China

[6] NYU, Polytech Sch Engn, Dept Elect & Comp Engn, Brooklyn, NY 11201 USA

[7] Univ Texas Arlington, Dept Psychol, Arlington, TX 76019 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2015年 / 26卷 / 04期

基金：

中国国家自然科学基金; 美国国家科学基金会; 北京市自然科学基金;

关键词：

Actor-critic; approximate dynamic programming (ADP); category; optimal control; shunting inhibitory artificial neural network (SIANN); MULTIOBJECTIVE OPTIMAL-CONTROL; DYNAMIC-PROGRAMMING ALGORITHM; UNKNOWN NONLINEAR-SYSTEMS; OPTIMAL TRACKING CONTROL; OPTIMAL-CONTROL SCHEME; ZERO-SUM GAMES; ADAPTIVE-CONTROL; FEEDBACK-CONTROL; EMOTIONAL INFLUENCES; LEARNING ALGORITHM;

D O I：

10.1109/TNNLS.2015.2399020

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In industrial process control, there may be multiple performance objectives, depending on salient features of the input-output data. Aiming at this situation, this paper proposes multiple actor-critic structures to obtain the optimal control via input-output data for unknown nonlinear systems. The shunting inhibitory artificial neural network (SIANN) is used to classify the input-output data into one of several categories. Different performance measure functions may be defined for disparate categories. The approximate dynamic programming algorithm, which contains model module, critic network, and action network, is used to establish the optimal control in each category. A recurrent neural network (RNN) model is used to reconstruct the unknown system dynamics using input-output data. NNs are used to approximate the critic and action networks, respectively. It is proven that the model error and the closed unknown system are uniformly ultimately bounded. Simulation results demonstrate the performance of the proposed optimal control scheme for the unknown nonlinear system.

引用

页码：851 / 865

页数：15

共 50 条

[31] Event-Triggered Adaptive Dynamic Programming for Continuous-time Nonlinear System Using Measured Input-Output Data
Zhong, Xiangnan
Ni, Zhen
He, Haibo
2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
[32] Input-output Finite-time Stability of Switched Singular Continuous-time Systems
Feng, Tian
Wu, Baowei
Wang, Yue-E
Chen, YangQuan
INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2021, 19 (05) : 1828 - 1835
[33] Input-output Finite-time Stability of Switched Singular Continuous-time Systems
Tian Feng
Baowei Wu
Yue-E Wang
YangQuan Chen
International Journal of Control, Automation and Systems, 2021, 19 : 1828 - 1835
[34] Adaptive Inverse Optimal Control for Rehabilitation Robot Systems Using Actor-Critic Algorithm
Meng, Fancheng
Dai, Yaping
MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
[35] Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems
Skach, Jan
Kiumarsi, Bahare
Lewis, Frank L.
Straka, Ondrej
IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (01) : 29 - 40
[36] Closed-loop continuous-time model identification with noisy input-output
Victor, Stephane
Diudichi, Arnold
Melchior, Pierre
IFAC PAPERSONLINE, 2017, 50 (01): : 12853 - 12858
[37] Distributed fault diagnosis for continuous-time nonlinear systems: The input-output case
Boem, Francesca
Ferrari, Riccardo M. G.
Parisini, Thomas
Polycarpou, Marios M.
ANNUAL REVIEWS IN CONTROL, 2013, 37 (01) : 163 - 169
[38] Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data
Kiumarsi, Bahare
Lewis, Frank L.
Naghibi-Sistani, Mohammad-Bagher
Karimpour, Ali
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (12) : 2770 - 2779
[39] BITANGENTIAL INTERPOLATION FOR INPUT-OUTPUT MAPS OF TIME-VARYING SYSTEMS - THE CONTINUOUS-TIME CASE
BALL, JA
GOHBERG, I
KAASHOEK, MA
INTEGRAL EQUATIONS AND OPERATOR THEORY, 1994, 20 (01) : 1 - 43
[40] Adaptive Optimal Surrounding Control of Multiple Unmanned Surface Vessels via Actor-Critic Reinforcement Learning
Lu, Renzhi
Wang, Xiaotao
Ding, Yiyu
Zhang, Hai-Tao
Zhao, Feng
Zhu, Lijun
He, Yong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,

← 1 2 3 4 5 →