Structured Online Learning-based Control of Continuous-time Nonlinear Systems

被引：4

作者：

Farsi, Milad ^{[1
]}

Liu, Jun ^{[1
]}

机构：

[1] Univ Waterloo, Appl Math Dept, Waterloo, ON, Canada

来源：

IFAC PAPERSONLINE | 2020年 / 53卷 / 02期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Reinforcement learning; Model-based learning; Optimal control; Feedback control; Continuous-time control; Adaptive dynamic programming; Sparse identification;

D O I：

10.1016/j.ifacol.2020.12.2299

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Model-based reinforcement learning techniques accelerate the learning task by employing a transition model to make predictions. In this paper, a model-based learning approach is presented that iteratively computes the optimal value function based on the most recent update of the model. Assuming a structured continuous-time model of the system in terms of a set of bases, we formulate an infinite horizon optimal control problem addressing a given control objective. The structure of the system along with a value function parameterized in the quadratic form provides a flexibility in analytically calculating an update rule for the parameters. Hence, a matrix differential equation of the parameters is obtained, where the solution is used to characterize the optimal feedback control in terms of the bases, at any time step. Moreover, the quadratic form of the value function suggests a compact way of updating the parameters that considerably decreases the computational complexity. Considering the state-dependency of the differential equation, we exploit the obtained framework as an online learning-based algorithm. In the numerical results, the presented algorithm is implemented on four nonlinear benchmark examples, where the regulation problem is successfully solved while an identified model of the system is obtained with a bounded prediction error. Copyright (C) 2020 The Authors.

引用

页码：8142 / 8149

页数：8

共 50 条

[31] Inverse optimal control for deterministic continuous-time nonlinear systems
Johnson, Miles
Aghasadeghi, Navid
Bretl, Timothy
2013 IEEE 52ND ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2013, : 2906 - 2913
[32] Control of continuous-time nonlinear systems using neural networks
He, SL
Reif, K
Unbehauen, R
INTERNATIONAL WORKSHOP ON NEURAL NETWORKS FOR IDENTIFICATION, CONTROL, ROBOTICS, AND SIGNAL/IMAGE PROCESSING - PROCEEDINGS, 1996, : 402 - 409
[33] Tracking control of nonlinear lumped mechanical continuous-time systems: A model-based iterative learning approach
Smolders, K.
Volckaert, M.
Swevers, J.
MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2008, 22 (08) : 1896 - 1916
[34] Actuator fault tolerant control in nonlinear continuous-time systems
Jiang, Bin
Staroswiecki, Marcel
WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 5483 - +
[35] Data-Based Self-Learning Optimal Control for Continuous-Time Unknown Nonlinear Systems With Disturbance
Wei, Qinglai
Liu, Derong
Song, Ruizhuo
Yan, Pengfei
PROCEEDINGS OF THE 28TH CHINESE CONTROL AND DECISION CONFERENCE (2016 CCDC), 2016, : 6633 - 6638
[36] Stochastic Sampling Control for A Class of Nonlinear Continuous-time Systems
Fan, Xing
Jia, Xinchun
Chi, Xiaobo
Wang, Xiaokai
2010 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-5, 2010, : 3140 - +
[37] Online reinforcement learning for a class of partially unknown continuous-time nonlinear systems via value iteration
Su, Hanguang
Zhang, Huaguang
Zhang, Kun
Gao, Wenzhong
OPTIMAL CONTROL APPLICATIONS & METHODS, 2018, 39 (02): : 1011 - 1028
[38] Neuro-Control for Continuous-Time Stochastic Nonlinear Systems via Online Policy Iteration Algorithm
Zhou, Tianmin
Hou, Jiaxu
Li, Handong
Di, Zengru
Zhao, Bo
PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 1499 - 1503
[39] An Online Actor/Critic Algorithm for Event-Triggered Optimal Control of Continuous-Time Nonlinear Systems
Vamvoudakis, Kyriakos G.
2014 AMERICAN CONTROL CONFERENCE (ACC), 2014, : 1 - 6
[40] Constrained Online Optimal Control for Continuous-Time Nonlinear Systems Using Neuro-Dynamic Programming
Yang Xiong
Liu Derong
Wang Ding
Ma Hongwen
2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 8717 - 8722

← 1 2 3 4 5 →