Safe Policies for Reinforcement Learning via Primal-Dual Methods

被引:17
|
作者
Paternain, Santiago [1 ]
Calvo-Fullana, Miguel [2 ]
Chamon, Luiz F. O. [3 ]
Ribeiro, Alejandro [4 ]
机构
[1] Renssealaer Polytech Inst, Elect Comp & Syst Engn, Troy, NY 12180 USA
[2] MIT, Dept Aeronaut & Astronaut, Cambridge, MA 02139 USA
[3] Univ Calif Berkeley, Berkeley, CA 94551 USA
[4] Univ Penn, Dept Elect & Syst Engn, Philadelphia, PA 19104 USA
关键词
Safety; Trajectory; Reinforcement learning; Task analysis; Optimal control; Optimization; Markov processes; Autonomous systems; gradient methods; unsupervised learning; APPROXIMATION;
D O I
10.1109/TAC.2022.3152724
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we study the design of controllers in the context of stochastic optimal control under the assumption that the model of the system is not available. This is, we aim to control a Markov decision process of which we do not know the transition probabilities, but we have access to sample trajectories through experience. We define safety as the agent remaining in a desired safe set with high probability during the operation time. The drawbacks of this formulation are twofold. The problem is nonconvex and computing the gradients of the constraints with respect to the policies is prohibitive. Hence, we propose an ergodic relaxation of the constraints with the following advantages. 1) The safety guarantees are maintained in the case of episodic tasks and they hold until a given time horizon for continuing tasks. 2) The constrained optimization problem despite its nonconvexity has arbitrarily small duality gap if the parametrization of the controller is rich enough. 3) The gradients of the Lagrangian associated with the safe learning problem can be computed using standard reinforcement learning results and stochastic approximation tools. Leveraging these advantages, we exploit primal-dual algorithms to find policies that are safe and optimal. We test the proposed approach in a navigation task in a continuous domain. The numerical results show that our algorithm is capable of dynamically adapting the policy to the environment and the required safety levels.
引用
收藏
页码:1321 / 1336
页数:16
相关论文
共 50 条
  • [21] Continuous Primal-Dual Methods for Image Processing
    Goldman, M.
    SIAM JOURNAL ON IMAGING SCIENCES, 2011, 4 (01): : 366 - 385
  • [22] Primal-dual methods for vertex and facet enumeration
    Bremner, D
    Fukuda, K
    Marzetta, A
    DISCRETE & COMPUTATIONAL GEOMETRY, 1998, 20 (03) : 333 - 357
  • [23] Can Primal Methods Outperform Primal-Dual Methods in Decentralized Dynamic Optimization?
    Yuan, Kun
    Xu, Wei
    Ling, Qing
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 : 4466 - 4480
  • [24] Primal-dual approach to inexact subgradient methods
    Au, K.T.
    Mathematical Programming, Series A, 1996, 72 (03):
  • [25] Primal-dual Newton methods in structural optimization
    Hoppe, Ronald H. W.
    Linsenmann, Christopher
    Petrova, Svetozara I.
    COMPUTING AND VISUALIZATION IN SCIENCE, 2006, 9 (02) : 71 - 87
  • [26] Primal-dual subgradient methods for convex problems
    Yurii Nesterov
    Mathematical Programming, 2009, 120 : 221 - 259
  • [27] Fast distributed algorithms via primal-dual
    Panconesi, Alessandro
    STRUCTURAL INFORMATION AND COMMUNICATION COMPLEXITY, PROCEEDINGS, 2007, 4474 : 1 - 6
  • [28] Primal-Dual Reinforcement Learning for Zero-Sum Games in the Optimal Tracking Control
    Que, Xuejie
    Wang, Zhenlei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (06) : 3146 - 3150
  • [29] Fast Distributed Scheduling via Primal-Dual
    Panconesi, Alessandro
    Sozio, Mauro
    SPAA'08: PROCEEDINGS OF THE TWENTIETH ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2008, : 229 - +
  • [30] Learning With Subquadratic Regularization : A Primal-Dual Approach
    Sankaran, Raman
    Bach, Francis
    Bhattacharyya, Chiranjib
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1963 - 1969