Safe Policies for Reinforcement Learning via Primal-Dual Methods

被引:17
|
作者
Paternain, Santiago [1 ]
Calvo-Fullana, Miguel [2 ]
Chamon, Luiz F. O. [3 ]
Ribeiro, Alejandro [4 ]
机构
[1] Renssealaer Polytech Inst, Elect Comp & Syst Engn, Troy, NY 12180 USA
[2] MIT, Dept Aeronaut & Astronaut, Cambridge, MA 02139 USA
[3] Univ Calif Berkeley, Berkeley, CA 94551 USA
[4] Univ Penn, Dept Elect & Syst Engn, Philadelphia, PA 19104 USA
关键词
Safety; Trajectory; Reinforcement learning; Task analysis; Optimal control; Optimization; Markov processes; Autonomous systems; gradient methods; unsupervised learning; APPROXIMATION;
D O I
10.1109/TAC.2022.3152724
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we study the design of controllers in the context of stochastic optimal control under the assumption that the model of the system is not available. This is, we aim to control a Markov decision process of which we do not know the transition probabilities, but we have access to sample trajectories through experience. We define safety as the agent remaining in a desired safe set with high probability during the operation time. The drawbacks of this formulation are twofold. The problem is nonconvex and computing the gradients of the constraints with respect to the policies is prohibitive. Hence, we propose an ergodic relaxation of the constraints with the following advantages. 1) The safety guarantees are maintained in the case of episodic tasks and they hold until a given time horizon for continuing tasks. 2) The constrained optimization problem despite its nonconvexity has arbitrarily small duality gap if the parametrization of the controller is rich enough. 3) The gradients of the Lagrangian associated with the safe learning problem can be computed using standard reinforcement learning results and stochastic approximation tools. Leveraging these advantages, we exploit primal-dual algorithms to find policies that are safe and optimal. We test the proposed approach in a navigation task in a continuous domain. The numerical results show that our algorithm is capable of dynamically adapting the policy to the environment and the required safety levels.
引用
下载
收藏
页码:1321 / 1336
页数:16
相关论文
共 50 条
  • [1] Learning Safe Policies via Primal-Dual Methods
    Paternain, Santiago
    Calvo-Fullana, Miguel
    Chamon, Luiz F. O.
    Ribeiro, Alejandro
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 6491 - 6497
  • [2] Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach
    Bai, Qinbo
    Bedi, Amrit Singh
    Agarwal, Mridul
    Koppel, Alec
    Aggarwal, Vaneet
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3682 - 3689
  • [3] Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization
    Wai, Hoi-To
    Yang, Zhuoran
    Wang, Zhaoran
    Hong, Mingyi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [4] Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD
    Lee, Donghwan
    Yoon, Hyungjin
    Hovakimyan, Naira
    2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 1967 - 1972
  • [5] Interpreting Primal-Dual Algorithms for Constrained Multiagent Reinforcement Learning
    Tabas, Daniel
    Zamzam, Ahmed S.
    Zhang, Baosen
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [6] Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations
    Kamoutsi, Angeliki
    Banjac, Goran
    Lygeros, John
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [7] Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss
    Qiu, Shuang
    Wei, Xiaohan
    Yang, Zhuoran
    Ye, Jieping
    Wang, Zhaoran
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [8] Provably Efficient Safe Exploration via Primal-Dual Policy Optimization
    Ding, Dongsheng
    Wei, Xiaohan
    Yang, Zhuoran
    Wang, Zhaoran
    Jovanovic, Mihailo R.
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [9] Achieving Zero Constraint Violation for Concave Utility Constrained Reinforcement Learning via Primal-Dual Approach
    Bai, Qinbo
    Bedi, Amrit Singh
    Agarwal, Mridul
    Koppel, Alec
    Aggarwal, Vaneet
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2023, 78 : 975 - 1016
  • [10] Primal-dual methods for linear programming
    Univ. Californa San Diego, dep. mathematics, La Jolla CA 92093, United States
    Mathematical Programming, Series B, 1995, 70 (03): : 251 - 277