A Q-learning predictive control scheme with guaranteed stability

被引:9
|
作者
Beckenbach, Lukas [1 ]
Osinenko, Pavel [1 ]
Streif, Stefan [1 ]
机构
[1] Tech Univ Chemnitz, Automat Control & Syst Dynam Lab, D-09107 Chemnitz, Germany
关键词
Predictive control; Q-Learning; Cost shaping; Nominal stability; RECEDING-HORIZON CONTROL; DISCRETE-TIME-SYSTEMS; NONLINEAR-SYSTEMS; FINITE; PERFORMANCE; MPC; STATE;
D O I
10.1016/j.ejcon.2020.03.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Model-based predictive controllers are used to tackle control tasks in which constraints on state, input or both need to be satisfied. These controllers commonly optimize a fixed finite-horizon cost, which relates to an infinite-horizon (IH) cost profile, while the resulting closed-loop under the predictive controller yields an in general suboptimal IH cost. To capture the optimal IH cost and the associated control policy, reinforcement learning methods, such as Q-learning, that approximate said cost via a parametric architec-ture can be employed. Conversely to predictive controllers, however, closed-loop stability has rarely been investigated under the approximation associated controller in explicit dependence of these parameters. It is the aim of this work to incorporate model-based Q-learning into a predictive control setup as to provide closed-loop stability in online learning, while eventually improving the performance of finite-horizon controllers. The proposed scheme provides nominal asymptotic stability and the observation was made that the suggested learning approach could in fact improve the performance against a baseline predictive controller. (c) 2020 European Control Association. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:167 / 178
页数:12
相关论文
共 50 条
  • [31] Nested Q-learning of hierarchical control structures
    Digney, BL
    ICNN - 1996 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS. 1-4, 1996, : 1676 - 1681
  • [32] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [33] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
  • [34] Discrete Time Formulation of Quasi Infinite Horizon Nonlinear Model Predictive Control Scheme with Guaranteed Stability
    Rajhans, Chinmay
    Patwardhan, Sachin C.
    Pillai, Harish
    IFAC PAPERSONLINE, 2017, 50 (01): : 7181 - 7186
  • [35] Pricing Scheme based Nash Q-Learning Flow Control for Multi-user Network
    Li, Xin
    Yu, Haibin
    MATERIALS, MECHATRONICS AND AUTOMATION, PTS 1-3, 2011, 467-469 : 847 - 852
  • [36] A Q-Learning Based Charging Scheduling Scheme for Electric Vehicles
    Dang, Qiyun
    Wu, Di
    Boulet, Benoit
    2019 IEEE TRANSPORTATION ELECTRIFICATION CONFERENCE AND EXPO (ITEC), 2019,
  • [37] A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios
    Tondwalkar, Ankita
    Kwasinski, Andres
    IEEE ACCESS, 2025, 13 : 19678 - 19693
  • [38] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
    Ohnishi, Shota
    Uchibe, Eiji
    Yamaguchi, Yotaro
    Nakanishi, Kosuke
    Yasui, Yuji
    Ishii, Shin
    FRONTIERS IN NEUROROBOTICS, 2019, 13
  • [39] Learning rates for Q-Learning
    Even-Dar, E
    Mansour, Y
    COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604
  • [40] Learning rates for Q-learning
    Even-Dar, E
    Mansour, Y
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25