Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems using Online Approximators

被引:154
|
作者
Yang, Qinmin [1 ]
Jagannathan, Sarangapani [2 ]
机构
[1] Zhejiang Univ, Dept Control Sci & Engn, State Key Lab Ind Control Technol, Hangzhou 310027, Zhejiang, Peoples R China
[2] Missouri Univ Sci & Technol, Dept Elect & Comp Engn, Rolla, MO 65409 USA
关键词
Adaptive critic; dynamic programming (DP); Lyapunov method; neural networks (NNs); online approximators (OLAs); online learning; reinforcement learning;
D O I
10.1109/TSMCB.2011.2166384
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discrete-time systems in the presence of bounded disturbances. The proposed controller design has two entities, an action network that is designed to produce optimal signal and a critic network that evaluates the performance of the action network. The critic estimates the cost-to-go function which is tuned online using recursive equations derived from heuristic dynamic programming. Here, neural networks (NNs) are used both for the action and critic whereas any OLAs, such as radial basis functions, splines, fuzzy logic, etc., can be utilized. For the output-feedback counterpart, an additional NN is designated as the observer to estimate the unavailable system states, and thus, separation principle is not required. The NN weight tuning laws for the controller schemes are also derived while ensuring uniform ultimate boundedness of the closed-loop system using Lyapunov theory. Finally, the effectiveness of the two controllers is tested in simulation on a pendulum balancing system and a two-link robotic arm system.
引用
收藏
页码:377 / 390
页数:14
相关论文
共 50 条
  • [31] Online reinforcement learning control of unknown nonaffline nonlinear discrete time systems
    Yang, Qinmin
    Jagannathan, S.
    PROCEEDINGS OF THE 46TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2007, : 5835 - 5840
  • [32] Iterative learning scheme design for a class of discrete-time nonlinear systems
    Zhang, Yamiao
    Liu, Jian
    Yuan, Yuan
    PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 1879 - 1884
  • [33] Stabilizing regions of PID controller for a class of unknown nonlinear non-affine discrete-time systems
    Xiong, Shuangshuang
    Hou, Zhongsheng
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2022, 32 (18) : 9421 - 9437
  • [34] Using discrete-time controller to globally stabilize a class of feedforward nonlinear systems
    Du, Haibo
    Qian, Chunjiang
    Li, Shihua
    Yang, Shizhong
    2011 50TH IEEE CONFERENCE ON DECISION AND CONTROL AND EUROPEAN CONTROL CONFERENCE (CDC-ECC), 2011, : 8309 - 8314
  • [35] Observer Design for Discrete-Time Nonlinear Systems
    Lin, Wei
    Wei, Jinfeng
    Wan, Feng
    47TH IEEE CONFERENCE ON DECISION AND CONTROL, 2008 (CDC 2008), 2008, : 5402 - 5407
  • [36] Observer design for discrete-time nonlinear systems
    Sundarapandian, V
    MATHEMATICAL AND COMPUTER MODELLING, 2002, 35 (1-2) : 37 - 44
  • [37] Design of preview controller for periodic discrete-time systems
    Li L.
    Ren Z.-Q.
    Yu X.
    Kongzhi yu Juece/Control and Decision, 2022, 37 (10): : 2585 - 2592
  • [38] Constrained optimal control of affine nonlinear discrete-time systems using ghjb method
    The School of Information Science and Engineering, Northeastern University, Shenyang, Liaoning 110004, China
    不详
    IEEE Symp. Adapt. Dyn. Program. Reinf. Learn., ADPRL - Proc., 2009, (16-21):
  • [39] IDENTIFICATION AND CONTROL OF DISCRETE-TIME NONLINEAR SYSTEMS USING AFFINE SUPPORT VECTOR MACHINES
    Zhang, Li
    Xi, Yu-Geng
    Zhou, Wei-Da
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2009, 18 (06) : 929 - 947
  • [40] Constrained Optimal Control of Affine Nonlinear Discrete-Time Systems Using GHJB Method
    Cui, Lili
    Zhang, Huaguang
    Liu, Derong
    Kim, Yongsu
    ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 16 - +