Unconstrained feedback controller design using Q-learning from noisy data

被引：0

作者：

Kumar, Pratyush ^{[1
]}

Rawlings, James B. ^{[1
]}

机构：

[1] Univ Calif Santa Barbara, Dept Chem Engn, Santa Barbara, CA 93106 USA

来源：

COMPUTERS & CHEMICAL ENGINEERING | 2023年 / 177卷

关键词：

Reinforcement learning; Q-learning; Least squares policy iteration; System identification; Maximum likelihood estimation; Linear quadratic regulator; MODEL-PREDICTIVE CONTROL; REINFORCEMENT; STABILITY; MPC;

D O I：

10.1016/j.compchemeng.2023.108325

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This paper develops a novel model-free Q-learning based approach to estimate linear, unconstrained feedback controllers from noisy process data. The proposed method is based on an extension of an available approach developed to estimate the linear quadratic regulator (LQR) for linear systems with full state measurements driven by Gaussian process noise of known covariance. First, we modify the approach to treat the case of an unknown noise covariance. Then, we use the modified approach to estimate a feedback controller for linear systems with both process and measurement noise and only output measurements. We also present a model-based maximum likelihood estimation (MLE) approach to determine a linear dynamic model and noise covariances from data, which is used to construct a regulator and state estimator for comparisons in simulation studies. The performances of the model-free and model-based controller estimation approaches are compared with an example heating, ventilation, and air-conditioning (HVAC) system. We show that the proposed Q-learning approach estimates a reasonably accurate feedback controller from 24 h of noisy data. The controllers estimated using both the model-free and model-based approaches provide similar closed-loop performances with 3.5 and 2.7% losses respectively, compared to a perfect controller that uses the true dynamic model and noise covariances of the HVAC system. Finally, we give future work directions for the model-free controller design approaches by discussing some remaining advantages of the model-based approaches.

引用

页数：13

共 50 条

[41] The impact of data distribution on Q-learning with function approximation
Santos, Pedro P.
Carvalho, Diogo S.
Sardinha, Alberto
Melo, Francisco S.
MACHINE LEARNING, 2024, 113 (09) : 6141 - 6163
[42] Data Driven Q-Learning for Commercial HVAC Control
Faddel, Samy
Tian, Guanyu
Zhou, Qun
Aburub, Haneen
IEEE SOUTHEASTCON 2020, 2020,
[43] Q-learning for estimating optimal dynamic treatment rules from observational data
Moodie, Erica E. M.
Chakraborty, Bibhas
Kramer, Michael S.
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2012, 40 (04): : 629 - 645
[44] Cooperative Deep Q-Learning Framework for Environments Providing Image Feedback
Raghavan, Krishnan
Narayanan, Vignesh
Jagannathan, Sarangapani
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (07) : 9267 - 9276
[45] A Design Method of Fuzzy Logic Controller by Using Q Learning Algorithm
Zhou, Xin
Fan, Cai Zhi
Wu, Jun
PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING (ICMLC 2018), 2018, : 357 - 362
[46] Data-Driven Saturated State Feedback Design for Polynomial Systems Using Noisy Data
Madeira, Diego de S.
Correia, Wilkley B.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (11) : 7932 - 7939
[47] WELL PATH DESIGN USING Q-LEARNING ALGORITHMS AND BEZIER CURVES WITH OBSTACLES AVOIDANCE
Cao, Jie
Chowdhury, Samiul Ehsan
Wiktorski, Tomasz
Sui, Dan
PROCEEDINGS OF ASME 2022 41ST INTERNATIONAL CONFERENCE ON OCEAN, OFFSHORE & ARCTIC ENGINEERING, OMAE2022, VOL 10, 2022,
[48] Experimental Design and Control of a Smart Morphing Wing System using a Q-learning Framework
Syed, Aqib A.
Khamvilai, Thanakorn
Kim, Yoobin
Vamvoudakis, Kyriakos G.
5TH IEEE CONFERENCE ON CONTROL TECHNOLOGY AND APPLICATIONS (IEEE CCTA 2021), 2021, : 354 - 359
[49] Active Learning from Noisy and Abstention Feedback
Yan, Songbai
Chaudhuri, Kamalika
Javidi, Tara
2015 53RD ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2015, : 1352 - 1357
[50] Kicking Motion Design of Humanoid Robots Using Gradual Accumulation Learning Method Based on Q-learning
Wang, Jiawen
Liang, Zhiwei
Zhou, Zixuan
Zhang, Yunfei
PROCEEDINGS OF THE 28TH CHINESE CONTROL AND DECISION CONFERENCE (2016 CCDC), 2016, : 5274 - 5279

← 1 2 3 4 5 →