Deep reinforcement learning based finite-horizon optimal control for a discrete-time affine nonlinear system

被引：0

作者：

Kim, Jong Woo ^{[1
]}

Park, Byung Jun ^{[1
]}

Yoo, Haeun ^{[2
]}

Lee, Jay H. ^{[2
]}

Lee, Jong Min ^{[1
]}

机构：

[1] Seoul Natl Univ, Sch Chem & Biol Engn, Inst Chem Proc, 1 Gwanak Ro, Seoul 08826, South Korea

[2] Korea Adv Inst Sci & Technol, Chem & Biomol Engn Dept, Daejeon 34141, South Korea

来源：

2018 57TH ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS OF JAPAN (SICE) | 2018年

关键词：

Reinforcement learning; Approximate dynamic programming; Deep learning; Actor-critic method; Finite horizon optimal control; DESIGN;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Approximate dynamic programming (ADP) aims to obtain an approximate numerical solution to the discrete time Hamilton-Jacobi-Bellman (HJB) equation. Heuristic dynamic programming (HDP) is a two-stage iterative scheme of ADP by separating the HJB equation into two equations, one for the value function and another for the policy function, which are referred to as the critic and the actor, respectively. Previous ADP implementations have been limited by the choice of function approximator, which requires significant prior domain knowledge or a large number of parameters to be fitted. However, recent advances in deep learning brought by the computer science community enable the use of deep neural networks (DNN) to approximate high-dimensional nonlinear functions without prior domain knowledge. Motivated by this, we examine the potential of DNNs as function approximators of the critic and the actor. In contrast to the infinite-horizon optimal control problem, the critic and the actor of the finite horizon optimal control (FHOC) problem are time-varying functions and have to satisfy a boundary condition. DNN structure and training algorithm suitable for FHOC are presented. Illustrative examples are provided to demonstrate the validity of the proposed method.

引用

页码：567 / 572

页数：6

共 50 条

[21] Data-driven finite-horizon optimal tracking control scheme for completely unknown discrete-time nonlinear systems
Song, Ruizhuo
Xie, Yulong
Zhang, Zenglian
NEUROCOMPUTING, 2019, 356 : 206 - 216
[22] A Recursive Elimination Method for Finite-Horizon Optimal Control Problems of Discrete-Time Rational Systems
Ohtsuka, Toshiyuki
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (11) : 3081 - 3086
[23] Reinforcement Learning for Finite-Horizon H∞ Tracking Control of Unknown Discrete Linear Time-Varying System
Ye, Linwei
Zhao, Zhonggai
Liu, Fei
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (10): : 6385 - 6396
[24] Model-free finite-horizon optimal tracking control of discrete-time linear systems
Wang, Wei
Xie, Xiangpeng
Feng, Changyang
APPLIED MATHEMATICS AND COMPUTATION, 2022, 433
[25] Finite-horizon H∞ tracking control for discrete-time linear systems
Wang, Jian
Wang, Wei
Liang, Xiaofeng
Zuo, Chao
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2023, 34 (01) : 54 - 70
[26] Finite-Horizon Linear-Quadratic Optimal Control of Discrete-Time Systems with Input Delay
Ignaciuk, Przemyslaw
2014 18TH INTERNATIONAL CONFERENCE SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2014, : 797 - 802
[27] A unified approach to finite-horizon generalized LQ optimal control problems for discrete-time systems
Ferrante, Augusto
Ntogramatzidis, Lorenzo
LINEAR ALGEBRA AND ITS APPLICATIONS, 2007, 425 (2-3) : 242 - 260
[28] Finite Horizon Optimal Tracking Control for Nonlinear Discrete-Time Switched Systems
Qin, Chunbin
Liu, Xianxing
Liu, Guoquan
Wang, Jun
Zhang, Dehua
NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 801 - 810
[29] Finite Horizon Optimal Tracking Control for a Class of Discrete-Time Nonlinear Systems
Wei, Qinglai
Wang, Ding
Liu, Derong
ADVANCES IN NEURAL NETWORKS - ISNN 2011, PT II, 2011, 6676 : 620 - 629
[30] FINITE-HORIZON OPTIMAL CONTROL OF DISCRETE-TIME LINEAR SYSTEMS WITH COMPLETELY UNKNOWN DYNAMICS USING Q-LEARNING
Zhao, Jingang
Zhang, Chi
JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2021, 17 (03) : 1471 - 1483

← 1 2 3 4 5 →