Unconstrained feedback controller design using Q-learning from noisy data

被引:0
|
作者
Kumar, Pratyush [1 ]
Rawlings, James B. [1 ]
机构
[1] Univ Calif Santa Barbara, Dept Chem Engn, Santa Barbara, CA 93106 USA
关键词
Reinforcement learning; Q-learning; Least squares policy iteration; System identification; Maximum likelihood estimation; Linear quadratic regulator; MODEL-PREDICTIVE CONTROL; REINFORCEMENT; STABILITY; MPC;
D O I
10.1016/j.compchemeng.2023.108325
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper develops a novel model-free Q-learning based approach to estimate linear, unconstrained feedback controllers from noisy process data. The proposed method is based on an extension of an available approach developed to estimate the linear quadratic regulator (LQR) for linear systems with full state measurements driven by Gaussian process noise of known covariance. First, we modify the approach to treat the case of an unknown noise covariance. Then, we use the modified approach to estimate a feedback controller for linear systems with both process and measurement noise and only output measurements. We also present a model-based maximum likelihood estimation (MLE) approach to determine a linear dynamic model and noise covariances from data, which is used to construct a regulator and state estimator for comparisons in simulation studies. The performances of the model-free and model-based controller estimation approaches are compared with an example heating, ventilation, and air-conditioning (HVAC) system. We show that the proposed Q-learning approach estimates a reasonably accurate feedback controller from 24 h of noisy data. The controllers estimated using both the model-free and model-based approaches provide similar closed-loop performances with 3.5 and 2.7% losses respectively, compared to a perfect controller that uses the true dynamic model and noise covariances of the HVAC system. Finally, we give future work directions for the model-free controller design approaches by discussing some remaining advantages of the model-based approaches.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] The impact of data distribution on Q-learning with function approximation
    Santos, Pedro P.
    Carvalho, Diogo S.
    Sardinha, Alberto
    Melo, Francisco S.
    MACHINE LEARNING, 2024, 113 (09) : 6141 - 6163
  • [42] Data Driven Q-Learning for Commercial HVAC Control
    Faddel, Samy
    Tian, Guanyu
    Zhou, Qun
    Aburub, Haneen
    IEEE SOUTHEASTCON 2020, 2020,
  • [43] Q-learning for estimating optimal dynamic treatment rules from observational data
    Moodie, Erica E. M.
    Chakraborty, Bibhas
    Kramer, Michael S.
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2012, 40 (04): : 629 - 645
  • [44] Cooperative Deep Q-Learning Framework for Environments Providing Image Feedback
    Raghavan, Krishnan
    Narayanan, Vignesh
    Jagannathan, Sarangapani
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (07) : 9267 - 9276
  • [45] A Design Method of Fuzzy Logic Controller by Using Q Learning Algorithm
    Zhou, Xin
    Fan, Cai Zhi
    Wu, Jun
    PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING (ICMLC 2018), 2018, : 357 - 362
  • [46] Data-Driven Saturated State Feedback Design for Polynomial Systems Using Noisy Data
    Madeira, Diego de S.
    Correia, Wilkley B.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (11) : 7932 - 7939
  • [47] WELL PATH DESIGN USING Q-LEARNING ALGORITHMS AND BEZIER CURVES WITH OBSTACLES AVOIDANCE
    Cao, Jie
    Chowdhury, Samiul Ehsan
    Wiktorski, Tomasz
    Sui, Dan
    PROCEEDINGS OF ASME 2022 41ST INTERNATIONAL CONFERENCE ON OCEAN, OFFSHORE & ARCTIC ENGINEERING, OMAE2022, VOL 10, 2022,
  • [48] Experimental Design and Control of a Smart Morphing Wing System using a Q-learning Framework
    Syed, Aqib A.
    Khamvilai, Thanakorn
    Kim, Yoobin
    Vamvoudakis, Kyriakos G.
    5TH IEEE CONFERENCE ON CONTROL TECHNOLOGY AND APPLICATIONS (IEEE CCTA 2021), 2021, : 354 - 359
  • [49] Active Learning from Noisy and Abstention Feedback
    Yan, Songbai
    Chaudhuri, Kamalika
    Javidi, Tara
    2015 53RD ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2015, : 1352 - 1357
  • [50] Kicking Motion Design of Humanoid Robots Using Gradual Accumulation Learning Method Based on Q-learning
    Wang, Jiawen
    Liang, Zhiwei
    Zhou, Zixuan
    Zhang, Yunfei
    PROCEEDINGS OF THE 28TH CHINESE CONTROL AND DECISION CONFERENCE (2016 CCDC), 2016, : 5274 - 5279