A reinforcement learning method with closed-loop stability guarantee

被引:5
|
作者
Osinenko, Pavel [1 ]
Beckenbach, Lukas [1 ]
Goehrt, Thomas [1 ]
Streif, Stefan [1 ]
机构
[1] Tech Univ Chemnitz, Automat Control & Syst Dynam Lab, Chemnitz, Germany
来源
IFAC PAPERSONLINE | 2020年 / 53卷 / 02期
关键词
Reinforcement learning control; Stability of nonlinear systems; Lyapunov methods; ITERATION;
D O I
10.1016/j.ifacol.2020.12.2237
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning (RL) in the context of control systems offers wide possibilities of controller adaptation. Given an infinite-horizon cost function, the so-called critic of RL approximates it with a neural net and sends this information to the controller (called "actor"). However, the issue of closed-loop stability under an RL-method is still not fully addressed. Since the critic delivers merely an approximation to the value function of the corresponding infinitehorizon problem, no guarantee can be given in general as to whether the actor's actions stabilize the system. Different approaches to this issue exist. The current work offers a particular one, which, starting with a (not necessarily smooth) control Lyapunov function (CLF), derives an online RL-scheme in such a way that practical semi-global stability property of the closed-loop can be established. The approach logically continues the work of the authors on parameterized controllers and Lyapunov-like constraints for RL, whereas the CLF now appears merely in one of the constraints of the control scheme. The analysis of the closed-loop behavior is done in a sample-and-hold (SH) manner thus offering a certain insight into the digital realization. The case study with a non-holonomic integrator shows the capabilities of the derived method to optimize the given cost function compared to a nominal stabilizing controller. Copyright (C) 2020 The Authors.
引用
收藏
页码:8043 / 8048
页数:6
相关论文
共 50 条
  • [21] Closed-Loop Deep Brain Stimulation With Reinforcement Learning and Neural Simulation
    Cho, Chia-Hung
    Huang, Pin-Jui
    Chen, Meng-Chao
    Lin, Chii-Wann
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2024, 32 : 3615 - 3624
  • [22] Online Learning ARMA Controllers With Guaranteed Closed-Loop Stability
    Sahin, Savas
    Guzelis, Cuneyt
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (11) : 2314 - 2326
  • [23] On closed-loop stability of model predictive controllers with learning costs
    Beckenbach, Lukas
    Osinenko, Pavel
    Streif, Stefan
    2020 EUROPEAN CONTROL CONFERENCE (ECC 2020), 2020, : 184 - 189
  • [24] A new method for closed-loop stability prediction in industrial robots
    Cvitanic, Toni
    Melkote, Shreyes N.
    Robotics and Computer-Integrated Manufacturing, 2022, 73
  • [25] A new method for closed-loop stability prediction in industrial robots
    Cvitanic, Toni
    Melkote, Shreyes N.
    ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2022, 73
  • [26] Using Theorem Provers to Guarantee Closed-Loop System Properties
    Arechiga, Nikos
    Loos, Sarah M.
    Platzer, Andre
    Krogh, Bruce H.
    2012 AMERICAN CONTROL CONFERENCE (ACC), 2012, : 3573 - 3580
  • [27] Application of Reinforcement Learning to Electrical Power System Closed-Loop Emergency Control
    Druet, C.
    Ernest, D.
    Wehenkel, L.
    LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 86 - 95
  • [28] Reinforcement Q-learning for Closed-loop Hypnosis Depth Control in Anesthesia
    Calvi, Giulia
    Manzoni, Eleonora
    Rampazzo, Mirco
    2022 30TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2022, : 164 - 169
  • [29] Closed-loop control of anesthesia and mean arterial pressure using reinforcement learning
    Padmanabhan, Regina
    Meskin, Nader
    Haddad, Wassim M.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2015, 22 : 54 - 64
  • [30] Closed-Loop Control of Anesthesia and Mean Arterial Pressure Using Reinforcement Learning
    Padmanabhan, Regina
    Meskin, Nader
    Haddad, Wassim M.
    2014 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2014, : 265 - 272