A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes

被引:0
|
作者
Khairy, Sami [1 ]
Balaprakash, Prasanna [2 ]
Cai, Lin X. [3 ]
机构
[1] IIT, Chicago, IL 60616 USA
[2] Oak Ridge Natl Lab, Comp & Computat Sci Directorate, Oak Ridge, TN 37831 USA
[3] IIT, Dept Elect & Comp Engn, Chicago, IL 60616 USA
关键词
Constrained Markov decision process (CMDP); gradient aware search (GAS); Lagrangian primal-dual optimization (PDO); piecewise linear convex (PWLC); ACTOR-CRITIC ALGORITHM; FUNCTION APPROXIMATION; POLICIES;
D O I
10.1109/TNNLS.2023.3315598
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The canonical solution methodology for finite constrained Markov decision processes (CMDPs), where the objective is to maximize the expected infinite-horizon discounted rewards subject to the expected infinite-horizon discounted costs' constraints, is based on convex linear programming (LP). In this brief, we first prove that the optimization objective in the dual linear program of a finite CMDP is a piecewise linear convex (PWLC) function with respect to the Lagrange penalty multipliers. Next, we propose a novel, provably optimal, two-level gradient-aware search (GAS) algorithm which exploits the PWLC structure to find the optimal state-value function and Lagrange penalty multipliers of a finite CMDP. The proposed algorithm is applied in two stochastic control problems with constraints for performance comparison with binary search (BS), Lagrangian primal-dual optimization (PDO), and LP. Compared with the benchmark algorithms, it is shown that the proposed GAS algorithm converges to the optimal solution quickly without any hyperparameter tuning. In addition, the convergence speed of the proposed algorithm is not sensitive to the initialization of the Lagrange multipliers.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [1] An exact iterative search algorithm for constrained Markov decision processes
    Chang, Hyeong Soo
    [J]. AUTOMATICA, 2014, 50 (05) : 1531 - 1534
  • [2] Gradient-Aware Model-Based Policy Search
    D'Oro, Pierluca
    Metelli, Alberto Maria
    Tirinzoni, Andrea
    Papini, Matteo
    Restelli, Marcello
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3801 - 3808
  • [3] A Gradient-Aware Line Sampling Algorithm for LiDAR Scanners
    Xuan Truong Nguyen
    Kim, Hyun
    Lee, Hyuk-Jae
    [J]. IEEE SENSORS JOURNAL, 2020, 20 (16) : 9283 - 9292
  • [4] On constrained Markov decision processes
    Department of Econometrics, University of Sydney, Sydney, NSW 2006, Australia
    不详
    [J]. Oper Res Lett, 1 (25-28):
  • [5] On constrained Markov decision processes
    Haviv, M
    [J]. OPERATIONS RESEARCH LETTERS, 1996, 19 (01) : 25 - 28
  • [6] Potential based optimization algorithm of constrained Markov decision processes
    Li Yanjie
    Yin Baoqun
    Xi Hongsheng
    [J]. Proceedings of the 24th Chinese Control Conference, Vols 1 and 2, 2005, : 433 - 436
  • [7] A Policy Gradient Approach for Finite Horizon Constrained Markov Decision Processes
    Guin, Soumyajit
    Bhatnagar, Shalabh
    [J]. 2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 3353 - 3359
  • [8] An actor-critic algorithm for constrained Markov decision processes
    Borkar, VS
    [J]. SYSTEMS & CONTROL LETTERS, 2005, 54 (03) : 207 - 213
  • [9] A Structure-aware Online Learning Algorithm for Markov Decision Processes
    Roy, Arghyadip
    Borkar, Vivek
    Karandikar, Abhay
    Chaporkar, Prasanna
    [J]. PROCEEDINGS OF THE 12TH EAI INTERNATIONAL CONFERENCE ON PERFORMANCE EVALUATION METHODOLOGIES AND TOOLS (VALUETOOLS 2019), 2019, : 71 - 78
  • [10] Learning in Constrained Markov Decision Processes
    Singh, Rahul
    Gupta, Abhishek
    Shroff, Ness B.
    [J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (01): : 441 - 453