Actor-Critic Reinforcement Learning for Control With Stability Guarantee

被引:52
|
作者
Han, Minghao [1 ]
Zhang, Lixian [1 ]
Wang, Jun [2 ]
Pan, Wei [3 ]
机构
[1] Harbin Inst Technol, Dept Control Sci & Engn, Harbin 150001, Peoples R China
[2] UCL, Dept Comp Sci, London WC1E 6BT, England
[3] Delft Univ Technol, Dept Cognit Robot, NL-2628 CD Delft, Netherlands
关键词
Reinforcement learning; stability; lyapunov's method; UNIFORM ULTIMATE BOUNDEDNESS; MODEL-PREDICTIVE CONTROL; MEAN-SQUARE STABILITY; JUMP LINEAR-SYSTEMS; DELAY;
D O I
10.1109/LRA.2020.3011351
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Reinforcement Learning (RL) and its integration with deep learning have achieved impressive performance in various robotic control tasks, ranging from motion planning and navigation to end-to-end visual manipulation. However, stability is not guaranteed in model-free RL by solely using data. From a control-theoretic perspective, stability is the most important property for any control system, since it is closely related to safety, robustness, and reliability of robotic systems. In this letter, we propose an actor-critic RL framework for control which can guarantee closed-loop stability by employing the classic Lyapunov's method in control theory. First of all, a data-based stability theorem is proposed for stochastic nonlinear systems modeled by Markov decision process. Then we show that the stability condition could be exploited as the critic in the actor-critic RL to learn a controller/policy. At last, the effectiveness of our approach is evaluated on several well-known 3-dimensional robot control tasks and a synthetic biology gene network tracking task in three different popular physics simulation platforms. As an empirical evaluation on the advantage of stability, we show that the learned policies can enable the systems to recover to the equilibrium or way-points when interfered by uncertainties such as system parametric variations and external disturbances to a certain extent.
引用
收藏
页码:6217 / 6224
页数:8
相关论文
共 50 条
  • [1] An Actor-Critic Framework for Online Control With Environment Stability Guarantee
    Osinenko, Pavel
    Yaremenko, Grigory
    Malaniya, Georgiy
    Bolychev, Anton
    [J]. IEEE ACCESS, 2023, 11 : 89188 - 89204
  • [2] Actor-Critic Reinforcement Learning for Tracking Control in Robotics
    Pane, Yudha P.
    Nageshrao, Subramanya P.
    Babuska, Robert
    [J]. 2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 5819 - 5826
  • [3] Actor-critic reinforcement learning for the feedback control of a swinging chain
    Dengler, C.
    Lohmann, B.
    [J]. IFAC PAPERSONLINE, 2018, 51 (13): : 378 - 383
  • [4] A World Model for Actor-Critic in Reinforcement Learning
    Panov, A. I.
    Ugadiarov, L. A.
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
  • [5] Curious Hierarchical Actor-Critic Reinforcement Learning
    Roeder, Frank
    Eppe, Manfred
    Nguyen, Phuong D. H.
    Wermter, Stefan
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 408 - 419
  • [6] Actor-Critic based Improper Reinforcement Learning
    Zaki, Mohammadi
    Mohan, Avinash
    Gopalan, Aditya
    Mannor, Shie
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [7] Integrated Actor-Critic for Deep Reinforcement Learning
    Zheng, Jiaohao
    Kurt, Mehmet Necip
    Wang, Xiaodong
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 505 - 518
  • [8] A fuzzy Actor-Critic reinforcement learning network
    Wang, Xue-Song
    Cheng, Yu-Hu
    Yi, Jian-Qiang
    [J]. INFORMATION SCIENCES, 2007, 177 (18) : 3764 - 3781
  • [9] A modified actor-critic reinforcement learning algorithm
    Mustapha, SM
    Lachiver, G
    [J]. 2000 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS 1 AND 2: NAVIGATING TO A NEW ERA, 2000, : 605 - 609
  • [10] Research on actor-critic reinforcement learning in RoboCup
    Guo, He
    Liu, Tianying
    Wang, Yuxin
    Chen, Feng
    Fan, Jianming
    [J]. WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 205 - 205