Multiple Actor-Critic Structures for Continuous-Time Optimal Control Using Input-Output Data

被引:124
|
作者
Song, Ruizhuo [1 ]
Lewis, Frank [2 ,3 ]
Wei, Qinglai [4 ]
Zhang, Hua-Guang [5 ]
Jiang, Zhong-Ping [6 ]
Levine, Dan [7 ]
机构
[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
[2] Univ Texas Arlington, UTA Res Inst, Ft Worth, TX USA
[3] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110004, Peoples R China
[4] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
[5] Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110004, Peoples R China
[6] NYU, Polytech Sch Engn, Dept Elect & Comp Engn, Brooklyn, NY 11201 USA
[7] Univ Texas Arlington, Dept Psychol, Arlington, TX 76019 USA
基金
中国国家自然科学基金; 美国国家科学基金会; 北京市自然科学基金;
关键词
Actor-critic; approximate dynamic programming (ADP); category; optimal control; shunting inhibitory artificial neural network (SIANN); MULTIOBJECTIVE OPTIMAL-CONTROL; DYNAMIC-PROGRAMMING ALGORITHM; UNKNOWN NONLINEAR-SYSTEMS; OPTIMAL TRACKING CONTROL; OPTIMAL-CONTROL SCHEME; ZERO-SUM GAMES; ADAPTIVE-CONTROL; FEEDBACK-CONTROL; EMOTIONAL INFLUENCES; LEARNING ALGORITHM;
D O I
10.1109/TNNLS.2015.2399020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In industrial process control, there may be multiple performance objectives, depending on salient features of the input-output data. Aiming at this situation, this paper proposes multiple actor-critic structures to obtain the optimal control via input-output data for unknown nonlinear systems. The shunting inhibitory artificial neural network (SIANN) is used to classify the input-output data into one of several categories. Different performance measure functions may be defined for disparate categories. The approximate dynamic programming algorithm, which contains model module, critic network, and action network, is used to establish the optimal control in each category. A recurrent neural network (RNN) model is used to reconstruct the unknown system dynamics using input-output data. NNs are used to approximate the critic and action networks, respectively. It is proven that the model error and the closed unknown system are uniformly ultimately bounded. Simulation results demonstrate the performance of the proposed optimal control scheme for the unknown nonlinear system.
引用
收藏
页码:851 / 865
页数:15
相关论文
共 50 条
  • [31] Event-Triggered Adaptive Dynamic Programming for Continuous-time Nonlinear System Using Measured Input-Output Data
    Zhong, Xiangnan
    Ni, Zhen
    He, Haibo
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [32] Input-output Finite-time Stability of Switched Singular Continuous-time Systems
    Feng, Tian
    Wu, Baowei
    Wang, Yue-E
    Chen, YangQuan
    INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2021, 19 (05) : 1828 - 1835
  • [33] Input-output Finite-time Stability of Switched Singular Continuous-time Systems
    Tian Feng
    Baowei Wu
    Yue-E Wang
    YangQuan Chen
    International Journal of Control, Automation and Systems, 2021, 19 : 1828 - 1835
  • [34] Adaptive Inverse Optimal Control for Rehabilitation Robot Systems Using Actor-Critic Algorithm
    Meng, Fancheng
    Dai, Yaping
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [35] Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems
    Skach, Jan
    Kiumarsi, Bahare
    Lewis, Frank L.
    Straka, Ondrej
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (01) : 29 - 40
  • [36] Closed-loop continuous-time model identification with noisy input-output
    Victor, Stephane
    Diudichi, Arnold
    Melchior, Pierre
    IFAC PAPERSONLINE, 2017, 50 (01): : 12853 - 12858
  • [37] Distributed fault diagnosis for continuous-time nonlinear systems: The input-output case
    Boem, Francesca
    Ferrari, Riccardo M. G.
    Parisini, Thomas
    Polycarpou, Marios M.
    ANNUAL REVIEWS IN CONTROL, 2013, 37 (01) : 163 - 169
  • [38] Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data
    Kiumarsi, Bahare
    Lewis, Frank L.
    Naghibi-Sistani, Mohammad-Bagher
    Karimpour, Ali
    IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (12) : 2770 - 2779
  • [39] BITANGENTIAL INTERPOLATION FOR INPUT-OUTPUT MAPS OF TIME-VARYING SYSTEMS - THE CONTINUOUS-TIME CASE
    BALL, JA
    GOHBERG, I
    KAASHOEK, MA
    INTEGRAL EQUATIONS AND OPERATOR THEORY, 1994, 20 (01) : 1 - 43
  • [40] Adaptive Optimal Surrounding Control of Multiple Unmanned Surface Vessels via Actor-Critic Reinforcement Learning
    Lu, Renzhi
    Wang, Xiaotao
    Ding, Yiyu
    Zhang, Hai-Tao
    Zhao, Feng
    Zhu, Lijun
    He, Yong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,