Probabilistic Constraint for Safety-Critical Reinforcement Learning

被引:0
|
作者
Chen W. [1 ]
Subramanian D. [2 ]
Paternain S. [1 ]
机构
[1] Department of Electrical Computer and Systems Engineering, Renssealaer Polytechnic Institute
关键词
Approximation algorithms; Optimization; Probabilistic logic; Robots; Safety; Trajectory; Tuning;
D O I
10.1109/TAC.2024.3379246
中图分类号
学科分类号
摘要
In this paper, we consider the problem of learning safe policies for probabilistic-constrained reinforcement learning (RL). Specifically, a safe policy or controller is one that, with high probability, maintains the trajectory of the agent in a given safe set. We establish a connection between this probabilisticconstrained setting and the cumulative-constrained formulation that is frequently explored in the existing literature. We provide theoretical bounds elucidating that the probabilistic-constrained setting offers a better trade-off in terms of optimality and safety (constraint satisfaction). The challenge encountered when dealing with the probabilistic constraints, as explored in this work, arises from the absence of explicit expressions for their gradients. Our prior work provides such an explicit gradient expression for probabilistic constraints which we term Safe Policy Gradient-REINFORCE (SPG-REINFORCE). In this work, we provide an improved gradient SPG-Actor-Critic that leads to a lower variance than SPG-REINFORCE, which is substantiated by our theoretical results. A noteworthy aspect of both SPGs is their inherent algorithm independence, rendering them versatile for application across a range of policy-based algorithms. Furthermore, we propose a Safe Primal-Dual algorithm that can leverage both SPGs to learn safe policies. It is subsequently followed by theoretical analyses that encompass the convergence of the algorithm, as well as the near-optimality and feasibility on average. In addition, we test the proposed approaches by a series of empirical experiments. These experiments aim to examine and analyze the inherent trade-offs between the optimality and safety, and serve to substantiate the efficacy of two SPGs, as well as our theoretical contributions. IEEE
引用
收藏
页码:1 / 16
页数:15
相关论文
共 50 条
  • [1] Evaluating Model-Free Reinforcement Learning toward Safety-Critical Tasks
    Zhang, Linrui
    Zhang, Qin
    Shen, Li
    Yuan, Bo
    Wang, Xueqian
    Tao, Dacheng
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15313 - 15321
  • [2] Design for constraint violation detection in safety-critical systems
    Subramanian, S
    Tsai, WT
    Rayadurgam, S
    [J]. THIRD IEEE INTERNATIONAL HIGH-ASSURANCE SYSTEMS ENGINEERING SYMPOSIUM, PROCEEDINGS, 1998, : 109 - 116
  • [3] Probabilistic Guarantees for Nonlinear Safety-Critical Optimal Control
    Akella, Prithvi
    Ubellacker, Wyatt
    Ames, Aaron D.
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 8120 - 8126
  • [4] A PROBABILISTIC LOGIC FOR THE DEVELOPMENT OF SAFETY-CRITICAL, INTERACTIVE SYSTEMS
    JOHNSON, CW
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1993, 39 (02): : 333 - 351
  • [5] PROBABILISTIC ASSESSMENT OF SAFETY-CRITICAL SOFTWARE - WHY AND HOW
    LAPRIE, JC
    LITTLEWOOD, B
    [J]. COMMUNICATIONS OF THE ACM, 1992, 35 (02) : 13 - &
  • [6] Reinforcement Learning-Based Decentralized Safety Control for Constrained Interconnected Nonlinear Safety-Critical Systems
    Qin, Chunbin
    Wu, Yinliang
    Zhang, Jishi
    Zhu, Tianzeng
    [J]. ENTROPY, 2023, 25 (08)
  • [7] Utilising Assured Multi-Agent Reinforcement Learning within Safety-Critical Scenarios
    Riley, Joshua
    Calinescu, Radu
    Paterson, Colin
    Kudenko, Daniel
    Banks, Alec
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 1061 - 1070
  • [8] Automatically Learning Fallback Strategies with Model-Free Reinforcement Learning in Safety-Critical Driving Scenarios
    Lecerf, Ugo U. L.
    Yemdji-Tchassi, Christelle C. Y.
    Aubert, Sebastien S. A.
    Michiardi, Pietro P. M.
    [J]. PROCEEDINGS OF 2022 7TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2022, 2022, : 209 - 215
  • [9] Probabilistic analysis of safety-critical adaptive systems with temporal dependences
    Adler, Rasmus
    Domis, Dominik J.
    Foerster, Marc
    Trapp, Mario
    [J]. ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 2008 PROCEEDINGS, 2008, : 151 - +
  • [10] Probabilistic Assessment of a Safety-Critical Backup Controller by Subset Simulation
    Wang, Mengmeng
    Zhang, Shuguang
    Holzapfel, Florian
    Loebl, David
    Hellmundt, Fabian
    [J]. JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2019, 42 (05) : 1146 - 1156