A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes

被引：0

作者：

Khairy, Sami ^{[1
]}

Balaprakash, Prasanna ^{[2
]}

Cai, Lin X. ^{[3
]}

机构：

[1] IIT, Chicago, IL 60616 USA

[2] Oak Ridge Natl Lab, Comp & Computat Sci Directorate, Oak Ridge, TN 37831 USA

[3] IIT, Dept Elect & Comp Engn, Chicago, IL 60616 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 35卷 / 12期

关键词：

Constrained Markov decision process (CMDP); gradient aware search (GAS); Lagrangian primal-dual optimization (PDO); piecewise linear convex (PWLC); ACTOR-CRITIC ALGORITHM; FUNCTION APPROXIMATION; POLICIES;

D O I：

10.1109/TNNLS.2023.3315598

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The canonical solution methodology for finite constrained Markov decision processes (CMDPs), where the objective is to maximize the expected infinite-horizon discounted rewards subject to the expected infinite-horizon discounted costs' constraints, is based on convex linear programming (LP). In this brief, we first prove that the optimization objective in the dual linear program of a finite CMDP is a piecewise linear convex (PWLC) function with respect to the Lagrange penalty multipliers. Next, we propose a novel, provably optimal, two-level gradient-aware search (GAS) algorithm which exploits the PWLC structure to find the optimal state-value function and Lagrange penalty multipliers of a finite CMDP. The proposed algorithm is applied in two stochastic control problems with constraints for performance comparison with binary search (BS), Lagrangian primal-dual optimization (PDO), and LP. Compared with the benchmark algorithms, it is shown that the proposed GAS algorithm converges to the optimal solution quickly without any hyperparameter tuning. In addition, the convergence speed of the proposed algorithm is not sensitive to the initialization of the Lagrange multipliers.

引用

页码：1 / 8

页数：8

共 50 条

[1] An exact iterative search algorithm for constrained Markov decision processes
Chang, Hyeong Soo
[J]. AUTOMATICA, 2014, 50 (05) : 1531 - 1534
[2] Gradient-Aware Model-Based Policy Search
D'Oro, Pierluca
Metelli, Alberto Maria
Tirinzoni, Andrea
Papini, Matteo
Restelli, Marcello
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3801 - 3808
[3] A Gradient-Aware Line Sampling Algorithm for LiDAR Scanners
Xuan Truong Nguyen
Kim, Hyun
Lee, Hyuk-Jae
[J]. IEEE SENSORS JOURNAL, 2020, 20 (16) : 9283 - 9292
[4] On constrained Markov decision processes
Department of Econometrics, University of Sydney, Sydney, NSW 2006, Australia
不详
[J]. Oper Res Lett, 1 (25-28):
[5] On constrained Markov decision processes
Haviv, M
[J]. OPERATIONS RESEARCH LETTERS, 1996, 19 (01) : 25 - 28
[6] Potential based optimization algorithm of constrained Markov decision processes
Li Yanjie
Yin Baoqun
Xi Hongsheng
[J]. Proceedings of the 24th Chinese Control Conference, Vols 1 and 2, 2005, : 433 - 436
[7] A Policy Gradient Approach for Finite Horizon Constrained Markov Decision Processes
Guin, Soumyajit
Bhatnagar, Shalabh
[J]. 2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 3353 - 3359
[8] An actor-critic algorithm for constrained Markov decision processes
Borkar, VS
[J]. SYSTEMS & CONTROL LETTERS, 2005, 54 (03) : 207 - 213
[9] A Structure-aware Online Learning Algorithm for Markov Decision Processes
Roy, Arghyadip
Borkar, Vivek
Karandikar, Abhay
Chaporkar, Prasanna
[J]. PROCEEDINGS OF THE 12TH EAI INTERNATIONAL CONFERENCE ON PERFORMANCE EVALUATION METHODOLOGIES AND TOOLS (VALUETOOLS 2019), 2019, : 71 - 78
[10] Learning in Constrained Markov Decision Processes
Singh, Rahul
Gupta, Abhishek
Shroff, Ness B.
[J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (01): : 441 - 453

← 1 2 3 4 5 →