Learn Zero-Constraint-Violation Safe Policy in Model-Free Constrained Reinforcement Learning

被引:0
|
作者
Ma, Haitong [1 ,2 ]
Liu, Changliu [3 ]
Li, Shengbo Eben [1 ,2 ]
Zheng, Sifa [1 ,2 ]
Sun, Wenchao [1 ,2 ]
Chen, Jianyu [4 ,5 ]
机构
[1] Tsinghua Univ, State Key Lab Automot Safety & Energy, Sch Vehicle & Mobil, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Ctr Intelligent Connected Vehicles & Transportat, Beijing 100084, Peoples R China
[3] Carnegie Mellon Univ, Inst Robot, Pittsburgh, PA 15213 USA
[4] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing 100084, Peoples R China
[5] Shanghai Qizhi Inst, Shanghai 200232, Peoples R China
基金
中国国家自然科学基金;
关键词
Constrained reinforcement learning (RL); safe RL; safety index; zero-violation policy; OPTIMIZATION;
D O I
10.1109/TNNLS.2023.3348422
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We focus on learning the zero-constraint-violation safe policy in model-free reinforcement learning (RL). Existing model-free RL studies mostly use the posterior penalty to penalize dangerous actions, which means they must experience the danger to learn from the danger. Therefore, they cannot learn a zero-violation safe policy even after convergence. To handle this problem, we leverage the safety-oriented energy functions to learn zero-constraint-violation safe policies and propose the safe set actor-critic (SSAC) algorithm. The energy function is designed to increase rapidly for potentially dangerous actions, locating the safe set on the action space. Therefore, we can identify the dangerous actions prior to taking them and achieve zero-constraint violation. Our major contributions are twofold. First, we use the data-driven methods to learn the energy function, which releases the requirement of known dynamics. Second, we formulate a constrained RL problem to solve the zero-violation policies. We prove that our Lagrangian-based constrained RL solutions converge to the constrained optimal zero-violation policies theoretically. The proposed algorithm is evaluated on the complex simulation environments and a hardware-in-loop (HIL) experiment with a real autonomous vehicle controller. Experimental results suggest that the converged policies in all environments achieve zero-constraint violation and comparable performance with model-based baseline.
引用
收藏
页码:2327 / 2341
页数:15
相关论文
共 50 条
  • [41] Model-Free Reinforcement Learning for Mean Field Games
    Mishra, Rajesh
    Vasal, Deepanshu
    Vishwanath, Sriram
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (04): : 2141 - 2151
  • [42] Counterfactual Credit Assignment in Model-Free Reinforcement Learning
    Mesnard, Thomas
    Weber, Theophane
    Viola, Fabio
    Thakoor, Shantanu
    Saade, Alaa
    Harutyunyan, Anna
    Dabney, Will
    Stepleton, Tom
    Heess, Nicolas
    Guez, Arthur
    Moulines, Eric
    Hutter, Marcus
    Buesing, Lars
    Munos, Remi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [43] Model-free Policy Learning with Reward Gradients
    Lan, Qingfong
    Tosatto, Samuele
    Farrahi, Homayoon
    Mahmood, A. Rupam
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [44] Model-Free Imitation Learning with Policy Optimization
    Ho, Jonathan
    Gupta, Jayesh K.
    Ermon, Stefano
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [45] Covariance matrix adaptation for model-free reinforcement learning
    Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct
    2013, Lavoisier, 14 rue de Provigny, Cachan Cedex, F-94236, France (27)
  • [46] Driving in Dense Traffic with Model-Free Reinforcement Learning
    Saxena, Dhruv Mauria
    Bae, Sangjae
    Nakhaei, Alireza
    Fujimura, Kikuo
    Likhachev, Maxim
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 5385 - 5392
  • [47] Model-Free Reinforcement Learning with Continuous Action in Practice
    Degris, Thomas
    Pilarski, Patrick M.
    Sutton, Richard S.
    2012 AMERICAN CONTROL CONFERENCE (ACC), 2012, : 2177 - 2182
  • [48] Robotic Table Tennis with Model-Free Reinforcement Learning
    Gao, Wenbo
    Graesser, Laura
    Choromanski, Krzysztof
    Song, Xingyou
    Lazic, Nevena
    Sanketi, Pannag
    Sindhwani, Vikas
    Jaitly, Navdeep
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5556 - 5563
  • [49] MODEL-FREE ONLINE REINFORCEMENT LEARNING OF A ROBOTIC MANIPULATOR
    Sweafford, Jerry, Jr.
    Fahimi, Farbod
    MECHATRONIC SYSTEMS AND CONTROL, 2019, 47 (03): : 136 - 143
  • [50] Model-free H control of Itô stochastic system via off-policy reinforcement learning
    Zhang, Weihai
    Guo, Jing
    Jiang, Xiushan
    AUTOMATICA, 2025, 174