Continuous-Time Q-Learning for Infinite-Horizon Discounted Cost Linear Quadratic Regulator Problems

被引：73

作者：

Palanisamy, Muthukumar ^{[1
,2
]}

Modares, Hamidreza ^{[2
]}

Lewis, Frank L. ^{[2
]}

Aurangzeb, Muhammad ^{[2
]}

机构：

[1] Gandhigram Rural Inst Deemed Univ, Dept Math, Gandhigram 624302, India

[2] Univ Texas Arlington Res Inst, Ft Worth, TX 76118 USA

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2015年 / 45卷 / 02期

基金：

美国国家科学基金会;

关键词：

Approximate dynamic programming (ADP); continuous-time dynamical systems; infinite-horizon discounted cost function; integral reinforcement learning (IRL); optimal control; Q-learning; value iteration (VI); ADAPTIVE OPTIMAL-CONTROL; ITERATION; SYSTEMS;

D O I：

10.1109/TCYB.2014.2322116

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a method of Q-learning to solve the discounted linear quadratic regulator (LQR) problem for continuous-time (CT) continuous-state systems. Most available methods in the existing literature for CT systems to solve the LQR problem generally need partial or complete knowledge of the system dynamics. Q-learning is effective for unknown dynamical systems, but has generally been well understood only for discrete-time systems. The contribution of this paper is to present a Q-learning methodology for CT systems which solves the LQR problem without having any knowledge of the system dynamics. A natural and rigorous justified parameterization of the Q-function is given in terms of the state, the control input, and its derivatives. This parameterization allows the implementation of an online Q-learning algorithm for CT systems. The simulation results supporting the theoretical development are also presented.

引用

页码：165 / 176

页数：12

共 50 条

[41] Q-Learning for Continuous-Time Linear Systems: A Data-Driven Implementation of the Kleinman Algorithm
Possieri, Corrado
Sassano, Mario
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (10): : 6487 - 6497
[42] An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems
Heng Zhang
Journal of Applied Mathematics and Computing, 2023, 69 : 2741 - 2760
[43] Logarithmic Regret for Episodic Continuous-Time Linear-Quadratic Reinforcement Learning over a Finite-Time Horizon
Basei, Matteo
Guo, Xin
Hu, Anran
Zhang, Yufei
JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[44] Logarithmic Regret for Episodic Continuous-Time Linear-Quadratic Reinforcement Learning over a Finite-Time Horizon
Basei, Matteo
Guo, Xin
Hu, Anran
Zhang, Yufei
Journal of Machine Learning Research, 2022, 23
[45] An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems
Zhang, Heng
JOURNAL OF APPLIED MATHEMATICS AND COMPUTING, 2023, 69 (03) : 2741 - 2760
[46] Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems
Lee, Jae Young
Park, Jin Bae
Choi, Yoon Ho
AUTOMATICA, 2012, 48 (11) : 2850 - 2859
[47] Stochastic adaptive control for continuous-time linear systems with quadratic cost
Chen, HF
Duncan, TE
PasikDuncan, B
APPLIED MATHEMATICS AND OPTIMIZATION, 1996, 34 (02): : 113 - 138
[48] Q-learning for continuous-time graphical games on large networks with completely unknown linear system dynamics
Vamvoudakis, Kyriakos G.
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2017, 27 (16) : 2900 - 2920
[49] Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems
Vamvoudakis, Kyriakos G.
AUTOMATICA, 2015, 61 : 274 - 281
[50] Degenerate Linear-Quadratic Problems for Continuous-Time Control Systems
Bunich, Alexander
2017 XI INTERNATIONAL IEEE SCIENTIFIC AND TECHNICAL CONFERENCE DYNAMICS OF SYSTEMS, MECHANISMS AND MACHINES (DYNAMICS), 2017,

← 1 2 3 4 5 →