Deep reinforcement learning based finite-horizon optimal control for a discrete-time affine nonlinear system

被引:0
|
作者
Kim, Jong Woo [1 ]
Park, Byung Jun [1 ]
Yoo, Haeun [2 ]
Lee, Jay H. [2 ]
Lee, Jong Min [1 ]
机构
[1] Seoul Natl Univ, Sch Chem & Biol Engn, Inst Chem Proc, 1 Gwanak Ro, Seoul 08826, South Korea
[2] Korea Adv Inst Sci & Technol, Chem & Biomol Engn Dept, Daejeon 34141, South Korea
关键词
Reinforcement learning; Approximate dynamic programming; Deep learning; Actor-critic method; Finite horizon optimal control; DESIGN;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Approximate dynamic programming (ADP) aims to obtain an approximate numerical solution to the discrete time Hamilton-Jacobi-Bellman (HJB) equation. Heuristic dynamic programming (HDP) is a two-stage iterative scheme of ADP by separating the HJB equation into two equations, one for the value function and another for the policy function, which are referred to as the critic and the actor, respectively. Previous ADP implementations have been limited by the choice of function approximator, which requires significant prior domain knowledge or a large number of parameters to be fitted. However, recent advances in deep learning brought by the computer science community enable the use of deep neural networks (DNN) to approximate high-dimensional nonlinear functions without prior domain knowledge. Motivated by this, we examine the potential of DNNs as function approximators of the critic and the actor. In contrast to the infinite-horizon optimal control problem, the critic and the actor of the finite horizon optimal control (FHOC) problem are time-varying functions and have to satisfy a boundary condition. DNN structure and training algorithm suitable for FHOC are presented. Illustrative examples are provided to demonstrate the validity of the proposed method.
引用
收藏
页码:567 / 572
页数:6
相关论文
共 50 条
  • [31] Optimal Control of Affine Nonlinear Discrete-time Systems
    Dierks, Travis
    Jagannthan, S.
    MED: 2009 17TH MEDITERRANEAN CONFERENCE ON CONTROL & AUTOMATION, VOLS 1-3, 2009, : 1390 - 1395
  • [32] Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning
    Zhang, Lingzhi
    Xie, Lei
    Jiang, Yi
    Li, Zhishan
    Liu, Xueqin
    Su, Hongye
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 12
  • [33] Finite-Horizon Neural Network-based Optimal Control Design for Affine Nonlinear Continuous-time Systems
    Zhao, Qiming
    Xu, Hao
    Dierks, Travis
    Jagannathan, S.
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [34] Adaptive dynamic programming for finite-horizon optimal control of linear time-varying discrete-time systems
    Pang, Bo
    Bian, Tao
    Jiang, Zhong-Ping
    CONTROL THEORY AND TECHNOLOGY, 2019, 17 (01) : 73 - 84
  • [35] Data-Driven Finite-Horizon Approximate Optimal Control for Discrete-Time Nonlinear Systems Using Iterative HDP Approach
    Mu, Chaoxu
    Wang, Ding
    He, Haibo
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (10) : 2948 - 2961
  • [36] Data-driven Finite-horizon Optimal Control for Linear Time-varying Discrete-time Systems
    Pang, Bo
    Bian, Tao
    Jiang, Zhong-Ping
    2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 861 - 866
  • [37] Adaptive dynamic programming for finite-horizon optimal control of linear time-varying discrete-time systems
    Bo Pang
    Tao Bian
    Zhong-Ping Jiang
    Control Theory and Technology, 2019, 17 : 73 - 84
  • [38] Finite-horizon optimal control for affine nonlinear systems with dead-zone control input
    Lin, Xiaofeng
    Ding, Qiang
    2013 CHINESE AUTOMATION CONGRESS (CAC), 2013, : 665 - 670
  • [39] Inverse Optimal Control for Finite-Horizon Discrete-time Linear Quadratic Regulator Under Noisy Output
    Zhang, Han
    Li, Yibei
    Hu, Xiaoming
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 6663 - 6668
  • [40] A nested computational approach to the discrete-time finite-horizon LQ control problem
    Marro, G
    Prattichizzo, D
    Zattoni, E
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (03) : 1002 - 1012